# Blog Content Data Flow Map

**Last Updated:** 2026-02-28  
**Purpose:** Single reference for pipeline scripts, output files, consumer scripts, and manual steps.

## Pipeline Order (run-new-post-pipeline.php)

1. SISTRIX Keywords → `keywords-sistrix.json`
2. PAA Questions → `paa-questions.json`
3. **SERP Features** → `serp-features.json` (before FAQ Research so faq-research merges PAA)
4. **FAQ Research** → `faq-research.json` (merges PAA from serp-features + paa-questions)
5. Competition Levels → `competition-levels.json`
6. Search Intent → `search-intent.json`
7. Competitor Analysis → `competitor-analysis.json`
8. Competitor Depth → `competitive-depth-analysis.md`
9. Firecrawl Validation (remediate sparse top 7)
10. SERP Skeleton → `SERP_ANALYSIS.md` (skeleton)
11. Content Depth Report → `content-depth-report.md`
12. Pre-Content Checklist → `PRE_CONTENT_CHECKLIST.md`

## PAA Source Hierarchy

`paa-questions-manual.json` (override) > `serp-features.json` > `paa-questions.json` > SISTRIX

`collect-faq-research-data.php` merges into `faq-research.json`. Downstream scripts use: manual > faq-research > paa-questions.

## Script → Output → Consumer

```mermaid
flowchart TB
    subgraph Pipeline [Pipeline Steps]
        K[collect-post-keywords-sistrix] --> KW[keywords-sistrix.json]
        PAA[collect-post-paa-questions] --> PAAJ[paa-questions.json]
        SERP[collect-post-serp-features] --> SERPJ[serp-features.json]
        FAQ[collect-faq-research-data] --> FAQJ[faq-research.json]
        COMP[collect-post-competitor-analysis] --> COMPJ[competitor-analysis.json]
        DEPTH[analyze-competitor-content-depth] --> DEPTHJ[competitive-depth-analysis.md]
    end
    subgraph Manual [Manual Steps]
        SERP_REVIEW[SERP analysis 30min]
        OUTLINE[Create CONTENT_OUTLINE.md]
    end
    subgraph Content [Content Creation]
        BRIEFS[generate-section-briefs] --> BRIEFSJ[section-briefs.md]
        WRITE[Write content]
    end
    KW --> FAQ
    PAAJ --> FAQ
    SERPJ --> FAQ
    COMPJ --> DEPTH
    DEPTHJ --> OUTLINE
    SERP_REVIEW --> OUTLINE
    OUTLINE --> BRIEFS
    BRIEFSJ --> WRITE
```

## Consumer Scripts (PAA Sources)

| Script | PAA Source Order |
|--------|------------------|
| collect-faq-research-data.php | manual > serp-features > paa-questions |
| validate-content-completeness.php | manual > faq-research > paa-questions |
| validate-content-outline-quality.php | manual > faq-research > paa-questions |
| generate-section-briefs.php | manual > faq-research > paa-questions |
| generate-faq-questions.php | faq-research (output of collect-faq-research-data) |

## Orphan Data Audit

Run periodically to catch data files with no known consumer:

```bash
php v2/scripts/blog/audit-orphan-data.php
php v2/scripts/blog/audit-orphan-data.php --post=slug --category=lexikon
```

Update the `$consumers` mapping in the script when adding new data files or consumers.

## Related Documentation

- [CONTENT_CREATION_WORKFLOW_2026.md](CONTENT_CREATION_WORKFLOW_2026.md) – Workflow and data flow
- [BLOG_POST_IMPROVEMENT_PROCESS.md](BLOG_POST_IMPROVEMENT_PROCESS.md) – Full improvement process
