# Branchen / industry pages — data collection and reports

**Last Updated:** 2026-04-10

Canonical inventory for SISTRIX, GSC, GA4, Serper PAA, and per-page research artifacts under `docs/content/pages/industry-pages/`. Registry-driven: [`../marketing-pages-registry.json`](../marketing-pages-registry.json). Slug matrix: [INDUSTRY_PAGES_INVENTORY.md](INDUSTRY_PAGES_INVENTORY.md).

**VIP Branchen LPs:** Higher SISTRIX budget is **in policy** for selective **`keyword.domain.seo` + `kw`** (cap 5 head terms from `target-keywords.json`) when paired with synthesis + `KEYWORD_DECISION.md` updates — [VIP_MARKETING_SEO_DATA_TIERS.md](../marketing-pages/VIP_MARKETING_SEO_DATA_TIERS.md).

**Branchen accounting scope:** **`branchen`** (hub) + **five** vertical `page_id`s (`gastronomie`, `einzelhandel`, `gesundheitswesen`, `handwerk-industrie`, `freizeit-kultur`). **`/branchen/gastronomie`** is the canonical Gastronomie URL (`branchen_gastronomie_neu.php`); **`/branchen/gastronomie-neu`** 301s there. Registry **`gastronomie`** `docs_dir` holds SEO data and **FAQ SSOT** (`faq-answers-optimized.json`). The folder `gastronomie-neu/` may still hold narrative drafts only — do not treat it as the FAQ source of truth.

## Security: SISTRIX API key

- Same rules as blog/tools: `SISTRIX_API_KEY` or `v2/config/sistrix-api-key.php` (gitignored). Never commit keys.

## Credit accounting

- Shared log: `v2/data/blog/sistrix-credits-log.json`.

## Script → output → when to refresh

| Step | Script | Output | Refresh cadence |
|------|--------|--------|-------------------|
| Global Branchen keywords | `v2/scripts/marketing-pages/collect-branchen-keywords-sistrix.php` | `docs/content/pages/branchen-keyword-sistrix.json` | Quarterly or when `branchen-candidate-keywords.json` changes |
| Merge portfolio table | `v2/scripts/marketing-pages/merge-branchen-opportunity-data.php` | stdout → paste [BRANCHEN_OPPORTUNITY_LIST.md](BRANCHEN_OPPORTUNITY_LIST.md) | After SISTRIX and/or GSC refresh |
| GSC API (live) | `v2/scripts/marketing-pages/collect-branchen-performance-gsc.php` | `docs/content/pages/branchen-performance-gsc.json` | Weekly/monthly; `--days=N` (default 90) |
| GA4 API | `v2/scripts/marketing-pages/collect-branchen-performance-ga4.php` (uses [`ga4-data-api.php`](../../../../v2/helpers/ga4-data-api.php)) | `docs/content/pages/branchen-performance-ga4.json` | Same cadence; property `275821028` |
| GSC JSON → per page | `v2/scripts/marketing-pages/split-branchen-gsc-to-registry-pages.php` (registry **`surface: industry` only**) | `{docs_dir}/data/performance-gsc.json` | After global GSC refresh |
| GSC queries (query dimension) | `php v2/scripts/tools/collect-tool-gsc-queries.php --path=/branchen/{slug} --output={docs_dir}/data/gsc-queries.json` | `{docs_dir}/data/gsc-queries.json` | FAQ/SEO iterations; optional `--start`/`--end` for fixed ranges |
| Period compare (queries) | `php v2/scripts/tools/compare-gsc-query-exports.php --before=... --after=... --output=...` | Markdown diff next to JSON | No API cost; uses two saved exports |
| Synthesis doc | `php v2/scripts/marketing-pages/generate-industry-data-synthesis.php --page={id}` | `{docs_dir}/DATA_DRIVEN_SYNTHESIS.generated.md` (default; hand-edit `DATA_DRIVEN_SYNTHESIS.md` as needed) | After GSC queries + SISTRIX refresh |
| GSC CSV → JSON (fallback) | `v2/scripts/product-pages/gsc-product-export.php --csv=... --marketing-page=<id>` | same per-page `performance-gsc.json` | If GSC API unavailable |
| Per-page SISTRIX | `v2/scripts/marketing-pages/collect-page-keywords-sistrix.php --page=<id>` | `{docs_dir}/data/keywords-sistrix.json` | After `data/target-keywords.json` edits |
| SISTRIX SERP top 10 (cheap `keyword.seo`) | `php v2/scripts/product-pages/collect-feature-page-keyword-serp.php --page=<id>` | `{docs_dir}/data/sistrix-keyword-serp.json` | ~1 cr/keyword, 7-day cache; cap `sistrix_limits.serp_keywords_limit` (default 8); run after `gsc-queries.json` exists for richer lists |
| SISTRIX domain SERP (VIP, optional) | `php v2/scripts/marketing-pages/collect-marketing-page-domain-kw-serp.php --page=<id>` | `{docs_dir}/data/sistrix-domain-kw-serp.json` | ~100 cr/kw; cap 5; [VIP_MARKETING_SEO_DATA_TIERS.md](../marketing-pages/VIP_MARKETING_SEO_DATA_TIERS.md); merge into synthesis |
| Serper PAA | `python3 v2/scripts/marketing-pages/serper-paa-research.py --page=<id>` | `{docs_dir}/data/faq-research.json` | Needs `SERPER_API_KEY`; FAQ refresh sprints |
| FAQ copy (flagship LP) | Gastronomie: `faq-answers-optimized.json` in `{docs_dir}`. Andere Branchen (`industry_*.php` + `render-faq-json.php`): `v2/data/industry-faqs/` (`retail.json`, `healthcare.json` für **Gesundheitswesen** / `gesundheitswesen`, `hospitality.json`, `crafts.json`, `leisure.json` für **Freizeit & Kultur** / `freizeit-kultur`) | Same | Typical 12–15 Q&As when GSC+SISTRIX justify depth ([FAQ_WEBSITE_STANDARD.md](../../../FAQ_WEBSITE_STANDARD.md)); always refresh SISTRIX + **`collect-tool-gsc-queries.php` per vertical** before large FAQ rewrites |
| Full chain (orchestrator) | `bash v2/scripts/marketing-pages/run-page-research-pipeline.sh <id>` (optional: `--with-gsc-queries`, `--with-sistrix-serp`, `--with-sistrix-domain-kw`, `--with-synthesis`, `--with-competitor-faq-scrape`, `--dry-run`) | SISTRIX + optional GSC queries + optional SERP + PAA + portfolio GSC/GA reminder | Default sprint: metrics + Serper; add `--with-sistrix-serp` monthly/quarterly when budget allows |
| Competitor FAQ scrape (optional) | `python3 v2/scripts/product-pages/scrape-competitor-faqs.py --page=<id>` | `{docs_dir}/competitor-faq-analysis.json` | Needs `FIRECRAWL_API_KEY`; or orchestrator `--with-competitor-faq-scrape` |

**Dry-runs:** `collect-branchen-keywords-sistrix.php --dry-run`; `collect-branchen-performance-gsc.php --dry-run`; `collect-page-keywords-sistrix.php --dry-run` (passed through).

## GSC and GA4

- **Recommended:**  
  - `php v2/scripts/marketing-pages/collect-branchen-performance-gsc.php` — filter **contains** `/branchen` (includes hub `/branchen` and subpaths).  
  - `php v2/scripts/marketing-pages/collect-branchen-performance-ga4.php` — `pagePath` contains `/branchen`.  
  Requires `v2/config/google-api-credentials.php` (same as blog/tools).
- **Per-page slice:** After global GSC JSON exists, run `split-branchen-gsc-to-registry-pages.php` so each page has `data/performance-gsc.json` without manual CSV filtering.
- **Fallback:** Manual GSC CSV + `gsc-product-export.php --marketing-page=<id>`.

## Competitor / Firecrawl (optional)

- Registry `competitor_urls` per page; short notes in `data/competitor-notes.md` (see workflow). Use `v2/helpers/firecrawl-remediate.php` patterns only when API budget allows; no auto-copy into production PHP.

## Related

- [MARKETING_RESEARCH_STACK_PARITY.md](../marketing-pages/MARKETING_RESEARCH_STACK_PARITY.md) — feature vs industry/static artifact parity + rollback  
- [INDUSTRY_PAGE_SEO_DATA_WORKFLOW.md](INDUSTRY_PAGE_SEO_DATA_WORKFLOW.md)  
- [BRANCHEN_OPPORTUNITY_LIST.md](BRANCHEN_OPPORTUNITY_LIST.md)  
- [BRANCHEN_SEO_IMPROVEMENT_BACKLOG.md](BRANCHEN_SEO_IMPROVEMENT_BACKLOG.md)  
- [.cursor/rules/marketing-pages-seo-data.mdc](../../../.cursor/rules/marketing-pages-seo-data.mdc)  
- [VIP_MARKETING_SEO_DATA_TIERS.md](../marketing-pages/VIP_MARKETING_SEO_DATA_TIERS.md)
