# PAA Topic Override Guide

**Last Updated:** 2026-02-11
**Purpose:** When and how to override SISTRIX PAA questions when they return off-topic results (e.g. Gen Z for Generation X keyword)

## Overview

The script `collect-post-paa-questions.php` fetches People Also Ask (PAA) questions from SISTRIX `keyword.questions`. In some cases, SISTRIX returns related questions that are **off-topic** for the post:

- **Demographic topics:** "Generation X" may return Gen Z/Gen Y questions (fashion, jewelry, lifestyle)
- **Ambiguous keywords:** Broad terms may surface tangentially related queries
- **HR/lexikon focus:** Post targets HR audience; SISTRIX returns general-population queries

When automated PAA are off-topic, **create** `data/paa-questions-manual.json` with a curated list from SERP_ANALYSIS and live browser research.

## When to Override

| Condition | Action |
|-----------|--------|
| SISTRIX returns >50% off-topic PAA | Create paa-questions-manual.json with SERP_ANALYSIS curated list |
| validate-content-completeness flags unrelated PAA | Use manual override so coverage check uses topic-relevant questions |
| Primary keyword is demographic/generational | Consider manual PAA (e.g. Generation X, Millennials, Gen Z) |

## How to Create Override

1. **Review** SERP_ANALYSIS.md "People Also Ask" section (from manual browser check or competitor analysis)
2. **Create** `docs/content/blog/posts/{category}/{slug}/data/paa-questions-manual.json`
3. **Format:**

```json
{
  "post_slug": "generation-x",
  "primary_keyword": "Generation X",
  "source": "manual",
  "rationale": "SISTRIX returns Gen Z questions; using SERP_ANALYSIS curated list",
  "questions": [
    "Was ist Generation X?",
    "Wann wurde Generation X geboren?",
    "Welche Merkmale hat Generation X?",
    "..."
  ]
}
```

4. **Run** `collect-faq-research-data.php` – it will load manual PAA instead of paa-questions.json
5. **Verify** `validate-content-completeness.php` passes

## Scripts That Use Manual Override

| Script | Behavior |
|--------|----------|
| collect-faq-research-data.php | Uses paa_questions from manual file when present |
| validate-content-completeness.php | PAA coverage check uses manual file first |
| analyze-competitor-content-depth.php | "PAA Questions to Cover" section uses manual file |

**Source hierarchy:** paa-questions-manual.json (if exists) → paa-questions.json → faq-research paa_questions

## Examples

### Generation X (off-topic)

- **SISTRIX paa-questions.json:** 15 questions, 14 about Gen Z (fashion, socks, hoodies)
- **Override:** paa-questions-manual.json with 10 Gen-X-relevant questions from SERP_ANALYSIS
- **Result:** faq-research loads 10 topic-relevant PAA; validate-content-completeness passes

### Work-Life-Balance (PAA failed)

- **SISTRIX:** HTTP 0 timeout; no paa-questions.json created
- **Override:** paa-questions-manual.json with 10 HR-focused questions from SERP (Was ist Work-Life-Balance? Wie verbessern? etc.)
- **Result:** collect-faq-research-data loads from manual file; pipeline continues; validate-content-completeness passes

## Validation

After creating override:

- `validate-content-completeness.php` – should pass (PAA from manual file; all covered in H2s or FAQs)
- `collect-faq-research-data.php` – outputs "Loaded N PAA questions from paa-questions-manual.json (override)"

## References

- [COMPETITIVE_DEPTH_OVERRIDE_GUIDE.md](COMPETITIVE_DEPTH_OVERRIDE_GUIDE.md) – Similar override pattern for word count
- [BLOG_POST_IMPROVEMENT_PROCESS.md](BLOG_POST_IMPROVEMENT_PROCESS.md) – Pipeline edge cases
- [DATA_COLLECTION_SCRIPTS_INVENTORY.md](DATA_COLLECTION_SCRIPTS_INVENTORY.md) – PAA source hierarchy
