# Primary Keyword Management Guide

**Last Updated:** 2026-03-24

Complete guide for managing primary keywords in blog posts, including data structure, extraction patterns, and best practices for SEO/AEO/GEO optimization.

## SEO Research: Use Proper German Spelling (Umlauts)

**Critical:** For SISTRIX, Serper, PAA, SERP features, and competitor analysis, always use **proper German spelling** (ü, ä, ö, ß)—not ASCII expansion (ue, ae, oe, ss). The slug stays ASCII for URLs; the primary keyword for research must use umlauts.

| Form | Use Case | Example |
|------|----------|---------|
| **Slug** (ASCII) | URLs, file paths, canonical | `fuehrungsstile` |
| **Primary keyword** (umlaut) | SISTRIX, Serper, PAA, SERP | `führungsstile` |

SISTRIX returns vastly different data: `fuehrungsstile` ≈ 10 searches/month vs `führungsstile` ≈ 8,050. Scripts apply `getSearchKeywordForApi()` before API calls. When creating posts with ASCII slugs, use `--title` so `primary_keyword` and `target-keywords.primary` store the umlaut form. See [LEXIKON_UMLAUT_REDIRECTS.md](../../systems/landing-page-redirects/LEXIKON_UMLAUT_REDIRECTS.md).

## Overview

Primary keywords are the main SEO keywords that each blog post targets. This guide explains the data structure, extraction patterns, and best practices for managing primary keywords across the blog system.

## Data Structure

### Primary Keyword Field

**Field**: `primary_keyword`  
**Type**: `string`  
**Required**: Yes  
**Description**: The main SEO keyword for the post

**Example**:

```json
{
  "primary_keyword": "dienstplan erstellen"
}
```

### Secondary Keywords Field

**Field**: `secondary_keywords`  
**Type**: `array<string>`  
**Required**: No (recommended)  
**Description**: Supporting SEO keywords that complement the primary keyword

**Example**:

```json
{
  "secondary_keywords": [
    "schichtplan erstellen",
    "dienstplanung",
    "arbeitsplan erstellen"
  ]
}
```

### Clusters vs Keywords

**Important Distinction**:

- **`clusters.primary`**: Usually a **taxonomy** label (e.g., "personalverwaltung", "gastronomie") for organization and internal linking. Some Zeiterfassung posts use a **topic-style** string (e.g., "zeiterfassung für kleinbetriebe") when it matches the SEO primary and avoids drift from `primary_keyword`. **Do not** let `clusters.primary` stay on an old shorthand (e.g., dropping **„für“**) when SISTRIX and titles target the exact query users type.

- **`primary_keyword`**: SEO keyword (e.g., "dienstplan erstellen", "zeiterfassung für kleinbetriebe")
  - Used for SEO optimization, SISTRIX/Serper/PAA seeds, and `derive-target-keywords.php`
  - Should match **high-intent tool queries** where they differ from slug tokenization (umlauts, prepositions)
  - Prefer **SISTRIX exact string** over internal shorthand when volumes differ; document in `KEYWORD_DECISION.md`

**Head term on the same URL:** High-volume broader phrases (e.g., **digitale zeiterfassung**) belong in **`secondary_keywords`** and copy when the page already covers them — reserve a **separate primary** only for another URL with distinct intent (see [KEYWORD_RESEARCH_WORKFLOW.md](KEYWORD_RESEARCH_WORKFLOW.md) § Head terms and cannibalization).

## Extraction Priority

The `getPrimaryKeywordFromPost()` helper function uses the following priority order:

1. **`primary_keyword` field** (new, preferred)
2. **`keywords[0]` array** (if exists and valid)
3. **`meta.keywords[0]`** (if exists and valid)
4. **Slug conversion** (fallback: convert slug to readable keyword)
5. **`clusters.primary`** (last resort, only if not generic cluster)

## Helper Functions

### `getPrimaryKeywordFromPost($postData)`

Extracts primary keyword from post data using priority order above.

**Location**: `v2/config/blog-keyword-helpers.php`

**Usage**:

```php
require_once $projectRoot . '/v2/config/blog-keyword-helpers.php';

$postData = json_decode(file_get_contents($postFile), true);
$primaryKeyword = getPrimaryKeywordFromPost($postData);
```

### `getSecondaryKeywordsFromPost($postData)`

Extracts secondary keywords from post data.

**Usage**:

```php
$secondaryKeywords = getSecondaryKeywordsFromPost($postData);
```

### `validateKeywordQuality($keyword)`

Validates keyword quality (length, not generic cluster, contains meaningful words).

**Usage**:

```php
if (validateKeywordQuality($keyword)) {
    // Keyword is valid
}
```

## Migration Script

### `fix-primary-keyword-structure.php`

Migrates `clusters.primary` values to `primary_keyword` field for all posts.

**Usage**:

```bash
# Dry run (test without changes)
php v2/scripts/blog/fix-primary-keyword-structure.php --dry-run

# Migrate all posts
php v2/scripts/blog/fix-primary-keyword-structure.php

# Migrate specific category
php v2/scripts/blog/fix-primary-keyword-structure.php --category=lexikon

# Migrate specific post
php v2/scripts/blog/fix-primary-keyword-structure.php --category=lexikon --post=midijob
```

**Features**:

- Creates backups before migration
- Validates keyword quality
- Extracts from slug if clusters.primary is generic
- Generates migration report

## Validation

### `validate-primary-keyword-structure.php`

Validates all posts have `primary_keyword` field and validates keyword quality.

**Usage**:

```bash
php v2/scripts/blog/validate-primary-keyword-structure.php

# Save report to file
php v2/scripts/blog/validate-primary-keyword-structure.php --output=validation-report.json
```

**Checks**:

- All posts have `primary_keyword` field
- Primary keywords are valid (not generic clusters)
- Secondary keywords structure exists (optional)
- SEO optimization fields exist (optional)
- Content refresh fields exist (optional)

## SEO/AEO/GEO Optimization Fields

### `seo_optimization` Object

Contains SEO optimization data:

```json
{
  "seo_optimization": {
    "primary_keyword": "dienstplan erstellen",
    "secondary_keywords": ["schichtplan erstellen", "dienstplanung"],
    "keyword_cluster": "dienstplan",
    "search_intent": "informational",
    "target_word_count": 2100,
    "recommended_headings": ["Was ist ein Dienstplan?", "Wie erstelle ich einen Dienstplan?"],
    "paa_questions": ["Wie erstelle ich einen Dienstplan?", "Was muss in einen Dienstplan?"],
    "competitor_insights": {
      "average_word_count": 2100,
      "recommended_headings": [...]
    }
  }
}
```

### `content_refresh` Object

Tracks content refresh status and optimization scores:

```json
{
  "content_refresh": {
    "last_refreshed": null,
    "refresh_priority": "medium",
    "content_gaps": [],
    "optimization_opportunities": [],
    "seo_score": null,
    "aeo_score": null,
    "geo_score": null
  }
}
```

## Best Practices

### 1. Use Specific Keywords

✅ **Good**: "dienstplan erstellen", "lohnabrechnung 2025"  
❌ **Bad**: "compliance", "efficiency", "tools"

### 2. Extract from Clusters When Appropriate

- If `clusters.primary` is a valid keyword (not generic), use it
- If `clusters.primary` is generic, extract from slug or title
- Always validate keyword quality

### 3. Populate Secondary Keywords

- Use SISTRIX related keywords data
- Support primary keyword with semantic variations
- Limit to top 10 most relevant

### 4. Keep PAA Questions Separate

- Store PAA questions in `seo_optimization.paa_questions`
- Keep existing FAQs unchanged
- Use PAA questions for content refresh, not immediate FAQ changes

### 5. Update Scripts to Use Helper Function

All scripts should use `getPrimaryKeywordFromPost()` for consistent extraction:

```php
require_once $projectRoot . '/v2/config/blog-keyword-helpers.php';

$primaryKeyword = getPrimaryKeywordFromPost($postData);
```

## Related Documentation

- [Keyword research workflow](./KEYWORD_RESEARCH_WORKFLOW.md) — primary/secondary order, SISTRIX ideas, GSC vs new posts
- [Data Structure Mapping](./reference/DATA_STRUCTURE_MAPPING.md)
- [SISTRIX Integration Guide](./SISTRIX_CONTENT_INTEGRATION_GUIDE.md)
- [Content Creation Workflow](./CONTENT_CREATION_WORKFLOW_2026.md)
