# Industry / marketing page SEO data workflow (blog-parity)

**Last Updated:** 2026-04-02

Data-driven research for high-value **Branchen-** and marketing LPs uses the same APIs and patterns as blog posts: **SISTRIX**, **Serper PAA**, optional **Firecrawl**, and **GSC** (prefer **Search Console API**, CSV fallback). Registry: [`docs/content/pages/marketing-pages-registry.json`](../marketing-pages-registry.json). Master inventory: [DATA_COLLECTION_BRANCHEN.md](DATA_COLLECTION_BRANCHEN.md).

---

## Prerequisites

- **Config:** `v2/config/sistrix-config.php`, Serper and optional Firecrawl keys (same env vars as product/blog scripts where applicable).
- **API smoke test:** `php v2/scripts/blog/test-api-access.php --all` (SISTRIX, GSC client, GA4 as configured).
- **GSC (recommended):** `php v2/scripts/marketing-pages/collect-branchen-performance-gsc.php` writes `docs/content/pages/branchen-performance-gsc.json`, then `php v2/scripts/marketing-pages/split-branchen-gsc-to-registry-pages.php` fills each page `data/performance-gsc.json`.
- **GSC (fallback):** CSV export + `gsc-product-export.php --marketing-page=…` as before.

---

## Per-page layout (`docs_dir` from registry)

| Artifact | Role |
|----------|------|
| `data/target-keywords.json` | Primary + secondaries (blog-shaped); edit before first SISTRIX run. |
| `data/keywords-sistrix.json` | SISTRIX metrics + related keywords (from `collect-page-keywords-sistrix.php`). |
| `data/sistrix-keyword-serp.json` | Optional cheap SERP top 10 (`keyword.seo`; `collect-feature-page-keyword-serp.php --page=<id>` or orchestrator `--with-sistrix-serp`). |
| `data/sistrix-domain-kw-serp.json` | Optional VIP domain+keyword SERP (`collect-marketing-page-domain-kw-serp.php` or `--with-sistrix-domain-kw`). |
| `competitor-faq-analysis.json` | Optional Firecrawl scrape (`scrape-competitor-faqs.py` or `--with-competitor-faq-scrape`). |
| `data/faq-research.json` | Serper PAA merge (`serper-paa-research.py`). |
| `data/performance-gsc.json` | Per-URL slice from global API + split script, or CSV via `gsc-product-export.php`. |
| `data/competitor-notes.md` | Short manual / Firecrawl prompts; optional. |
| `KEYWORD_DECISION.md` | Human-readable primary/secondary decision + evidence (update after data lands). |

Published FAQ copy stays in `faq-answers-optimized.json` (loaded by the PHP page). **Do not** auto-overwrite legal-sensitive answers without review.

---

## Orchestrator (recommended)

```bash
bash v2/scripts/marketing-pages/run-page-research-pipeline.sh gastronomie
# VIP full stack (when API keys + credits approved):
bash v2/scripts/marketing-pages/run-page-research-pipeline.sh gastronomie \
  --with-gsc-queries --with-sistrix-serp --with-synthesis
```

Default order: **SISTRIX** → **Serper** → optional **Firecrawl** note → **GSC/GA API** paths printed. With flags: optional **GSC query export** → optional **SISTRIX SERP** → optional **domain-kw** → **Serper** → optional **competitor scrape** → optional **synthesis** (`generate-industry-data-synthesis.php`). **Product** pages use `run-feature-page-research-pipeline.sh` instead.

---

## Individual commands

### 1. SISTRIX keywords

```bash
php v2/scripts/marketing-pages/collect-page-keywords-sistrix.php --page=gastronomie
# or directly:
php v2/scripts/blog/collect-post-keywords-sistrix.php --marketing-page=gastronomie
```

**Portfolio (global candidates):** `php v2/scripts/marketing-pages/collect-branchen-keywords-sistrix.php` → `docs/content/pages/branchen-keyword-sistrix.json`.

Credits are logged in `v2/data/blog/sistrix-credits-log.json` (same log as blog). Per-page caps can be raised via `sistrix_limits` on the registry entry (explicit runs only).

### 2. Serper PAA

```bash
export SERPER_API_KEY=…   # if not already in env
python3 v2/scripts/marketing-pages/serper-paa-research.py --page=gastronomie
```

### 3. GSC → JSON

**API (recommended):**

```bash
php v2/scripts/marketing-pages/collect-branchen-performance-gsc.php
php v2/scripts/marketing-pages/split-branchen-gsc-to-registry-pages.php
```

**CSV fallback:** Export Performance from Search Console, save CSV, then:

```bash
php v2/scripts/product-pages/gsc-product-export.php \
  --csv=path/to/gsc-export.csv \
  --marketing-page=gastronomie
```

Writes `data/performance-gsc.json` under the page `docs_dir`. **Canonical Gastronomie:** use `--marketing-page=gastronomie` and path **`/branchen/gastronomie`** (live: `branchen_gastronomie_neu.php`). Optional: export **legacy redirect URL** traffic only (e.g. separate row audit) with `--url-path=branchen/gastronomie-neu` and a one-off `--output=` under `gastronomie-neu/data/` — do **not** treat as FAQ/SEO SSOT.

### 3b. GA4 (optional portfolio view)

```bash
php v2/scripts/marketing-pages/collect-branchen-performance-ga4.php
```

→ `docs/content/pages/branchen-performance-ga4.json`

### 4. Firecrawl (optional)

Use `v2/scripts/product-pages/scrape-competitor-faqs.py --page=<registry_id>` (any industry/static/product id with `docs_dir` + `competitor_urls`) → `competitor-faq-analysis.json`, or `v2/helpers/firecrawl-remediate.php` patterns. Prefer short notes in `data/competitor-notes.md` over large HTML dumps.

---

## Cadence (guidance)

- **SISTRIX:** Monthly refresh for priority industry LPs when strategy warrants it (liberal credits only on intentional runs).
- **GSC/GA:** Weekly when the URL is traffic-sensitive; otherwise align with improvement cycles.
- **Serper PAA:** When refreshing FAQs or after major SERP shifts.

---

## Human gate before publish

1. Refresh `KEYWORD_DECISION.md` with evidence from `keywords-sistrix.json` and `performance-gsc.json`.
2. Align FAQ questions with `faq-research.json` PAA clusters.
3. Edit `faq-answers-optimized.json` manually; validate **FAQPage** / Rich Results on the live URL.

---

## References

- [DATA_COLLECTION_BRANCHEN.md](DATA_COLLECTION_BRANCHEN.md) — script inventory, merge table
- [CONTENT_CREATION_DATA_CHECKLIST.md](../../CONTENT_CREATION_DATA_CHECKLIST.md) — cross-surface data map
- [GASTRO_NEU_KEYWORD_RESEARCH.md](GASTRO_NEU_KEYWORD_RESEARCH.md) — Gastronomie-neu narrative research
- [blog-data-collection.mdc](../../../.cursor/rules/blog-data-collection.mdc) — blog cadence and API patterns
- [marketing-pages-seo-data.mdc](../../../../.cursor/rules/marketing-pages-seo-data.mdc) — agent-oriented script map (requestable)
