# blog-faq-optimization-core Full Instructions

## FAQ Structure Requirements

### JSON Structure Pattern

**Required JSON format for blog post JSON files:**

FAQs are stored in the `faqs` array within the blog post JSON file:

```json
{
  "faqs": [
    {
      "question": "Question text here?",
      "answer": "Answer text here (40-80 words)."
    },
    {
      "question": "Another question?",
      "answer": "Another answer..."
    }
  ]
}
```

**Key Requirements:**

- **Array Storage:** FAQs are stored in the `faqs` array only (not in HTML content)
- **Structure:** Each FAQ is an object with `question` and `answer` properties
- **No Duplicates:** Each question must be unique - no duplicate questions allowed
- **HTML in Answers:** Answers can contain HTML (e.g., links) but will be cleaned for schema

**Important:** FAQs are NOT stored in `content.html`. They are stored exclusively in the `faqs` array and rendered dynamically by the blog template.

### Deduplication Requirements

**CRITICAL:** Prevent duplicate FAQs to ensure quality and SEO compliance.

- **Unique Questions:** Each question must be unique (no duplicates, even with slight variations)
- **Check Before Adding:** Always check for existing FAQs before adding new ones
- **Validation:** Use `validate-faq-quality.php` to check for duplicates before publishing

**Prevention:**

- Run `php v2/scripts/blog/validate-faq-quality.php --post=slug --category=category` before adding FAQs
- Run `php v2/scripts/blog/check-h2-faq-overlap.php --post=slug --category=category` before finalizing (overlap similarity < 0.65; remove or rephrase overlapping FAQs)
- Validate with `php v2/scripts/blog/validate-faq-schema.php --all` before publishing

**Common Pitfalls:**

- **Duplicate Questions:** Same question asked differently still counts as duplicate - normalize questions for comparison
- **Boilerplate Text:** Avoid generic endings like "Diese Informationen sind wichtig für die korrekte Umsetzung..." - remove automatically with quality script

### FAQ Count Guidelines

**Optimal FAQ Count:**

- **Minimum:** 10 FAQs (for comprehensive coverage)
- **Optimal:** 10-15 FAQs (best balance of coverage and user experience)
- **Maximum:** 20 FAQs (beyond this, consider splitting into multiple sections)

**Rationale:**

- Too few FAQs (< 10): Missed keyword opportunities, incomplete coverage
- Too many FAQs (> 20): User fatigue, decreased engagement, potential performance impact
- 10-15 FAQs: Optimal for Featured Snippets, user engagement, and SEO

### FAQ Ordering Strategy

**CRITICAL:** FAQs must follow logical flow for optimal SEO/AEO/GEO performance.

**Logical Flow Order (2026 Best Practices):**

1. **Definition Questions** ("Was ist...?", "Was bedeutet...?") - Foundational understanding
2. **How-To Questions** ("Wie funktioniert...?", "Wie erstellt man...?") - Practical implementation
3. **Requirements/What Questions** ("Was muss ich...?", "Was sollte...?") - Requirements and guidelines
4. **When/Why Questions** ("Wann...?", "Warum...?") - Context and reasoning
5. **Which Questions** ("Welche...?") - Comparison and selection
6. **Yes/No Questions** ("Ist...?", "Darf...?") - Permission and validation
7. **Cost/Duration Questions** ("Was kostet...?", "Wie lange dauert...?") - Practical details
8. **Edge Cases/Troubleshooting** - Rare use-cases and exceptions

**Priority Order (by search volume and user intent):**

1. **High-volume calculation questions** (e.g., "Wie viel Arbeitslosengeld bei 2000€ netto?")
2. **Duration questions** (e.g., "Wie lange bekommt man Arbeitslosengeld?")
3. **Application questions** (e.g., "Wie muss ich mich beim Arbeitsamt melden?")
4. **Side income questions** (e.g., "Wie viel darf ich hinzuverdienen?")
5. **Technical questions** (e.g., "Wie wird Arbeitslosengeld berechnet?")

**Sources (in priority order):**

1. People Also Ask questions (from SISTRIX)
2. Top GSC queries (sorted by clicks, then position)
3. Related keywords
4. Standard questions based on topic

**Supplemental FAQ Sources (When FAQ Count &lt; 10):**

When PAA all map to H2s or `generate-faq-questions.php` produces fewer than 10 questions, use supplemental sources. See [FAQ_EXPANSION_GUIDE.md](docs/content/blog/FAQ_EXPANSION_GUIDE.md) for step-by-step instructions.

- Run `collect-supplemental-faq-questions.php` for competitor FAQs and LSI-based questions
- `generate-faq-questions.php` automatically loads `faq-questions-supplemental.json` when output &lt; 10

**Ordering Validation:**

- **`add-faqs-to-post.php` sorts FAQs by logical flow by default** – no extra step needed for new FAQs
- For existing posts with unordered FAQs: `php v2/scripts/blog/reorder-faqs-by-logical-flow.php --post=slug --category=category --write`
- Use `analyze-faq-ordering.php` to check logical flow and identify issues
- Definitions should come before how-to questions
- Related questions should be grouped together
- Avoid scattered ordering (same type questions spread throughout)

## Self-Contained Answers (Critical)

**FAQ answers must be standalone.** Never reference the article, post title, or tell users to "read more."

- ❌ "Im Artikel 'Affiliate-Netzwerke' findest du detaillierte Vergleiche."
- ❌ "Wie im Beitrag beschrieben, bieten Netzwerke..."
- ✅ Extract the actual information and state it directly in the answer.

**Rationale:** FAQ answers appear in SERP features (PAA, Featured Snippets), AI Overview—they must be independently understandable. Run `audit-all-faqs-quality.php` to detect meta-references.

## Answer Length Requirements

### Optimal Answer Length

**Target:** 40-80 words per answer

**Rationale:**

- **40 words minimum:** Ensures comprehensive answer, sufficient context
- **80 words maximum:** Optimal for Featured Snippets, prevents user fatigue
- **40-80 words:** Best balance for SEO, readability, and Featured Snippet eligibility

**Answer Structure Pattern:**

1. **Direct answer** (first sentence, 10-15 words): Answer the question directly
2. **Context/details** (middle sentences, 20-40 words): Provide additional context, examples, or calculations
3. **Actionable information** (if applicable, 10-15 words): What the user should do next

**Examples:**

- ✅ **GOOD (45 words):** "Bei einem Nettoeinkommen von 2000 € erhältst du 60% davon als Arbeitslosengeld 1, also 1200 € monatlich. Hast du Kinder, erhöht sich der Satz auf 67%, was 1340 € monatlich entspricht. Der Höchstbetrag liegt 2026 bei 2390 € (West) bzw. 2320 € (Ost)."
- ❌ **BAD (25 words):** "Du bekommst 60% deines Nettoeinkommens als Arbeitslosengeld. Mit Kindern sind es 67%."
- ❌ **BAD (120 words):** Too long, exceeds optimal length, may not fit Featured Snippets

## Keyword Integration

### Natural Keyword Integration

**Requirements:**

- ✅ **Natural integration:** Keywords appear naturally in questions and answers
- ✅ **User intent:** Match high-volume search queries (from keyword research)
- ✅ **No keyword stuffing:** Keywords should flow naturally, not forced
- ✅ **Semantic variations:** Use related terms and synonyms naturally

**Keyword Research Process (Data-Driven):**

1. **Check data freshness:** Use `check-data-freshness.php` to ensure data is fresh (< 7 days)
2. **Collect research data:** Use `collect-faq-research-data.php` to get PAA questions, GSC queries, keywords
   - PAA questions loaded from `serp-features.json` (SISTRIX SERP features)
   - GSC queries loaded from `performance-gsc.json` (metrics.top_queries)
   - Keywords loaded from `keywords-sistrix.json` (with volumes/competition)
3. **Generate questions:** Use `generate-faq-questions.php` to generate FAQ questions
   - Prioritizes PAA questions (priority 1)
   - Prioritizes GSC queries by clicks/impressions (priority 2)
   - Filters keywords by volume (≥ 50) or competition (> 0) (priority 3)
   - Uses shared keyword database for consistency
4. **Map to FAQs:** Match keywords to FAQ questions using actual data

**Examples:**

- ✅ **GOOD:** "Wie funktioniert die Zeiterfassung per App?" (matches high-volume query)
- ❌ **BAD:** "Wie funktioniert die Zeiterfassung per App Zeiterfassung App?" (keyword stuffing)

## Internal Linking Guidelines

### Natural Internal Linking

**Opportunity-driven:** Add links when FAQ answers mention terms with dedicated lexikon, tool, or product pages. 1:1 terms mandatory; never force links when none add value.

**Requirements:**

- **Relevance:** Links must be contextually relevant to the answer
- **Anchor text:** Use natural, contextual phrases (not "click here")
- **Target pages:** Relevant tools, products, or content pages

**When to link:**

- When answer mentions related calculation (e.g., Brutto-Netto conversion)
- When answer references related tool (e.g., Minijob calculator)
- When answer could benefit from additional context (e.g., product page)

**When NOT to link:**

- Forced links that don't add value
- Links in every FAQ (over-optimization)
- Links to unrelated pages

**Examples:**

- ✅ **GOOD:** "Für eine präzise Umrechnung nutze unseren [Brutto-Netto-Rechner](/tools/brutto-netto-rechner)."
- ✅ **GOOD:** "Für detaillierte Berechnungen zu Minijobs nutze unseren [Minijob-Rechner](/tools/minijob-rechner)."
- ❌ **BAD:** "Klicke hier für unseren Brutto-Netto-Rechner."
- ❌ **BAD:** "Weitere Informationen findest du [hier](/tools/brutto-netto-rechner)."

## FAQPage Schema Markup

### Schema Requirements

**Critical requirements:**

- **Exact match:** Schema answers must match HTML answers exactly (word-for-word)
- **No HTML in schema:** Remove HTML links from schema answers (use plain text)
- **All FAQs included:** Every HTML FAQ must have corresponding schema entry
- **Proper ordering:** Schema FAQs should match HTML FAQ order
- **JSON validation:** Ensure valid JSON syntax (no trailing commas, proper escaping)
- **Required properties:** `@context` must be `"https://schema.org"`, `@type` must be `"FAQPage"`, `mainEntity` must be array with Questions
- **Question structure:** Each Question must have `@type: "Question"`, `name` (question text), and `acceptedAnswer` with `@type: "Answer"` and `text` (plain text)
- **Single schema:** Only one FAQPage schema per page (no duplicates)
- **Reflective content:** Schema content must match visible content on page

**Text Cleaning Requirements:**

- Strip all HTML tags from answer text
- Decode HTML entities properly
- Replace smart quotes with regular quotes
- Normalize whitespace (multiple spaces → single space)
- Remove control characters
- Ensure UTF-8 encoding

**Example:**

- **HTML:** "Für eine präzise Umrechnung nutze unseren <a href=\"/tools/brutto-netto-rechner\">Brutto-Netto-Rechner</a>."
- **Schema:** "Für eine präzise Umrechnung nutze unseren Brutto-Netto-Rechner."

**Validation:**

- Test with Google Rich Results Test: https://search.google.com/test/rich-results
- Use `validate-faq-schema.php` script to validate all posts
- Schema generator automatically handles text cleaning and normalization
- See `docs/content/blog/FAQ_SCHEMA_BEST_PRACTICES.md` for complete guide

## Content Writing Best Practices

**See:** `.cursor/rules/content-writing.mdc` for comprehensive content writing guidelines.

### Natural Language Writing

**Requirements:**

- ✅ **Natural, conversational tone:** Write as if speaking directly to user
- ✅ **Varied sentence structures:** Mix short and long sentences
- ✅ **Specific examples:** Include concrete examples and scenarios
- ✅ **Personal insights:** When appropriate, include insights or experiences
- ✅ **Avoid AI content tells:** No overly formal language, repetitive structures, or generic transitions

**AI Content Avoidance:**

- ❌ Avoid: "Furthermore", "Moreover", "In conclusion"
- ❌ Avoid: Overly formal language patterns
- ❌ Avoid: Repetitive sentence structures
- ✅ Use: Natural transitions, varied structures, specific examples

**See:** `docs/content/AI_CONTENT_AVOIDANCE_GUIDE.md` for comprehensive guide.

### Copy Guidelines (FAQ-Specific)

### Du Tone (Informal German)

**Requirements:**

- ✅ **Use "du" pronouns:** Address user informally ("du", "dich", "dein")
- ✅ **Conversational tone:** Write as if speaking directly to user
- ✅ **Active voice:** Use active voice, not passive

**Examples:**

- ✅ **GOOD:** "Du erhältst 60% deines Nettoeinkommens als Arbeitslosengeld."
- ❌ **BAD:** "Es wird 60% des Nettoeinkommens als Arbeitslosengeld ausgezahlt."

### Ordio Mentions

**Requirements:**

- **Frequency:** Mention Ordio naturally when relevant (not forced)
- **Context:** Only mention when it adds value to the answer
- **Tone:** Natural integration, not promotional

**Examples:**

- ✅ **GOOD:** "Unser Rechner zeigt dir deinen genauen Anspruch." (natural, helpful)
- ✅ **GOOD:** "Für detaillierte Berechnungen nutze unseren Minijob-Rechner." (contextual, helpful)
- ❌ **BAD:** "Ordio bietet den besten Arbeitslosengeld-Rechner." (promotional, forced)

**Note:** See `shared-patterns.mdc` for complete universal copy guidelines.

## Manual Review Requirements

**CRITICAL:** Manual review is mandatory for all FAQ creation and optimization. Batch processing is NOT allowed.

### Review Process

**Step 1: Run Comprehensive Analysis**

```bash
# Comprehensive Analysis (all quality checks)
php v2/scripts/blog/comprehensive-faq-analysis.php --post=slug --category=category

# Topic Relevance Validation
php v2/scripts/blog/validate-faq-topic-relevance.php --post=slug --category=category

# Pattern Detection
php v2/scripts/blog/detect-faq-patterns.php --post=slug --category=category

# Ordering Analysis
php v2/scripts/blog/analyze-faq-ordering.php --post=slug --category=category

# SEO Analysis
php v2/scripts/blog/analyze-faqs-seo.php --post=slug --category=category

# Uniqueness Check
php v2/scripts/blog/check-faq-uniqueness.php --post=slug --category=category

# Improvement Suggestions
php v2/scripts/blog/suggest-faq-improvements.php --post=slug --category=category
```

**Step 2: Review Analysis Output**

- Identify off-topic FAQs (remove if relevance < 0.3)
- Identify pattern violations (remove high-severity violations)
- Identify duplicate questions (remove or merge)
- Identify ordering issues (reorder logically)
- Identify missing high-value queries (add FAQs)
- Identify repetitive answers (rewrite with unique angles)
- Note SEO opportunities (keyword integration, query coverage)

**Step 3: Fix Quality Issues**

**Automated Fixes (with manual review):**

```bash
# Fix common issues (dry-run first)
php v2/scripts/blog/fix-faq-quality-issues.php --post=slug --category=category --dry-run
php v2/scripts/blog/fix-faq-quality-issues.php --post=slug --category=category --backup
```

**Manual Edit JSON File:**

- Remove off-topic FAQs (relevance < 0.3)
- Remove pattern violations (nonsensical patterns)
- Remove duplicate FAQs (keep best answer)
- Fix malformed questions (remove fragments, fix grammar)
- Reorder FAQs logically (definitions first, then how-to, etc.)
- Rewrite repetitive FAQs with unique angles
- Add FAQs for missing high-value queries
- Optimize answers for SEO (keyword integration, length, du tone)
- Ensure each FAQ provides unique value

**Step 4: Validate Changes**

```bash
php v2/scripts/blog/check-faq-uniqueness.php --post=slug --category=category
php v2/scripts/blog/validate-faq-schema.php --post=slug --category=category
```

**Step 5: Document Review**

- Update progress tracker
- Note issues found and fixes applied
- Document SEO improvements

### Review Checklist

See `docs/content/blog/FAQ_MANUAL_REVIEW_SEO_CHECKLIST.md` for complete checklist.

**Must Have:**

- **Topic relevance:** All FAQs relevant to post (relevance ≥ 0.3)
- **No pattern violations:** No nonsensical patterns (cost/duration with abstract concepts)
- **No brand questions:** No Ordio questions on non-brand posts
- **No duplicate questions:** Semantic similarity < 0.7
- **No repetitive answers:** Content similarity < 0.6
- **Logical ordering:** Definitions before how-to, high-volume queries first
- **Top 10 GSC queries addressed**
- **Primary keyword in 3-5 FAQs**
- **Answers 40-80 words**
- **Du tone consistent**
- **No malformed questions:** Complete, grammatically correct

### Tools Reference

**Comprehensive Analysis Tools:**

- `comprehensive-faq-analysis.php` - Complete quality analysis (topic relevance, patterns, duplicates, ordering)
- `validate-faq-topic-relevance.php` - Topic relevance validation
- `detect-faq-patterns.php` - Pattern violation detection
- `analyze-faq-ordering.php` - Ordering analysis and suggestions
- `analyze-faqs-seo.php` - SEO analysis (GSC queries, keywords, duplicates)
- `check-faq-uniqueness.php` - Uniqueness check (questions and answers)
- `suggest-faq-improvements.php` - Improvement suggestions (new FAQs, keywords)

**Fix Tools:**

- `fix-faq-quality-issues.php` - Automated fixes for common issues (with manual review)

**Review Tools:**

- `review-faq-quality-post-by-post.php` - Interactive quality review workflow
- `manual-review-faqs-post-by-post.php` - Manual review interface

**Validation Tools:**

- `validate-faq-schema.php` - Schema validation
- `validate-faq-quality.php` - Quality validation

## Quality Checklist

### Content Quality

- [ ] 10-15 FAQs per post (optimal count)
- [ ] Answers are 40-80 words each
- [ ] Primary keyword appears naturally in each answer
- [ ] Natural keyword integration (no stuffing)
- [ ] Du tone consistency (informal German)
- [ ] Natural Ordio mentions (not forced)
- [ ] Contextual internal links when FAQs mention lexikon/tool/product terms (1:1 mandatory)
- [ ] No template language
- [ ] No malformed questions
- [ ] **All FAQs relevant to post topic** (topic relevance ≥ 0.3)
- [ ] **No pattern violations** (nonsensical cost/duration patterns)
- [ ] **No brand questions on non-brand posts**
- [ ] **Logical ordering** (definitions before how-to, high-volume first)

### SEO Quality

- [ ] Primary keyword is correct (not a fragment)
- [ ] FAQ ordering follows logical flow (definitions → how-to → details → edge cases)
- [ ] FAQ ordering follows priority strategy (high-volume queries first)
- [ ] Questions match People Also Ask queries
- [ ] Keywords integrated naturally
- [ ] Schema markup complete and valid
- [ ] No duplicate questions (similarity < 0.7)
- [ ] Questions validated (no fragments, grammatically correct)
- [ ] **No off-topic FAQs** (all FAQs relate to post topic)
- [ ] **No repetitive nonsensical patterns**

### Technical Quality

- [ ] FAQs stored in `faqs` array (not in HTML content)
- [ ] Each FAQ has `question` and `answer` properties
- [ ] Schema answers match HTML answers exactly (word-for-word)
- [ ] No HTML links in schema answers (plain text only)
- [ ] Valid JSON syntax
- [ ] Schema validates with Google Rich Results Test
- [ ] All FAQs manually reviewed and approved

## Workflow

### Gap Remediation (Posts with 0 or &lt;10 FAQs)

**See:** `docs/content/blog/FAQ_GAP_REMEDIATION_RUNBOOK.md` for step-by-step gap remediation workflow.

- Run `python3 v2/scripts/blog/audit-faq-gap-analysis.py` to identify posts needing work
- Follow runbook: data collection → keyword derivation → question generation → answer generation → add → validate
- Manual review required per post; no automated content creation

### For New Posts Without FAQs (Fresh Start Approach)

**See:** `docs/content/blog/FAQ_CREATION_WORKFLOW_2026.md` for complete workflow guide.

1. **Collect Research Data:**

   ```bash
   php v2/scripts/blog/collect-faq-research-data.php --post=slug --category=category
   ```

   This collects:

   - SISTRIX PAA questions
   - GSC top queries
   - Target keywords
   - LSI keywords
   - Search intent data

2. **Generate FAQ Questions:**

   ```bash
   php v2/scripts/blog/generate-faq-questions.php --post=slug --category=category
   ```

   This generates 10-15 FAQ questions based on research data.

3. **Generate FAQ Answers:**

   ```bash
   php v2/scripts/blog/generate-faq-answers-optimized.php --post=slug --category=category --use-ai
   ```

   This generates SEO/AEO/GEO optimized answers (40-80 words each). **Required:** Use `--use-ai` for production; template mode produces placeholders (~20-40 words) and will fail validation. Manual alternative: create `data/faq-answers-optimized.json` with 40-80 word answers.

4. **Manual Review:**

   - Review generated FAQs using `docs/content/blog/FAQ_MANUAL_REVIEW_CHECKLIST.md`
   - Edit answers as needed for natural language and accuracy
   - Ensure du tone, keyword integration, and internal links

5. **Add FAQs to Post:**

   ```bash
   php v2/scripts/blog/add-faqs-to-post.php --post=slug --category=category --faqs=faq-answers-optimized.json
   ```

   This adds FAQs to the `faqs` array in the post JSON file.

6. **Validate:**

   ```bash
   php v2/scripts/blog/validate-faq-quality.php --post=slug --category=category
   php v2/scripts/blog/validate-faq-schema.php --post=slug --category=category
   ```

### For Batch Processing (FAQ Rebuild)

**See:** `docs/content/blog/FAQ_REBUILD_PROGRESS.md` for progress tracking.

1. **Process Batch:**

   ```bash
   php v2/scripts/blog/rebuild-faqs-batch.php --tier=1 --batch-size=10
   ```

2. **Manual Review** (use checklist for each batch)

3. **Add Approved FAQs** to posts

4. **Validate Schema** for all posts in batch

## Related Documentation

- `docs/content/blog/FAQ_GAP_REMEDIATION_RUNBOOK.md` - Step-by-step gap remediation for posts with 0 or &lt;10 FAQs
- `docs/content/blog/FAQ_CREATION_WORKFLOW_2026.md` - Complete FAQ creation workflow (2026 fresh start approach)
- `docs/content/blog/FAQ_REBUILD_PROGRESS.md` - Progress tracking for FAQ rebuild project
- `docs/content/blog/FAQ_REBUILD_PRIORITY_LIST.md` - Prioritized list for FAQ rebuild
- `docs/content/blog/FAQ_MANUAL_REVIEW_CHECKLIST.md` - Manual review checklist
- `docs/content/blog/FAQ_WORKFLOW.md` - FAQ workflow documentation
- `docs/content/blog/FAQ_CURRENT_STATE_BASELINE.md` - Baseline audit report
