# Page improvement — iteration checklist (existing URLs)

**Last Updated:** 2026-04-02

**Purpose:** Repeatable steps when **optimizing a live URL** (not net-new pages): collect analytics, compare periods where useful, cross-reference SISTRIX / SERP / optional Firecrawl, then edit copy, meta, FAQs, schema. Canonical theory: [PAGE_IMPROVEMENT_DATA_PLAYBOOK.md](PAGE_IMPROVEMENT_DATA_PLAYBOOK.md).

**Ticket / PR handoff:** [_templates/PAGE_IMPROVEMENT_HANDOFF.md](_templates/PAGE_IMPROVEMENT_HANDOFF.md).

---

## 0. Prerequisites

- [ ] `v2/config/google-api-credentials.php` (GSC + GA4 API) where scripts apply.
- [ ] SISTRIX API key for SISTRIX collectors (`SISTRIX_API_KEY` or gitignored key file) — see [DATA_COLLECTION_TOOLS.md](tools/DATA_COLLECTION_TOOLS.md).
- [ ] Credit awareness: [SISTRIX API documentation](https://www.sistrix.com/api/) — check `credits` endpoint and `v2/data/blog/sistrix-credits-log.json`.

---

## 1. Surface → data collection (inventory)

| Surface | Baseline GSC / GA | Query-level GSC | SISTRIX / research | Primary doc |
|--------|-------------------|-----------------|-------------------|-------------|
| **Tools / Rechner** | `collect-tools-performance-gsc.php`, `collect-tools-performance-ga4.php` (optional `--compare-28d`) | `collect-tool-gsc-queries.php` · `--start`/`--end` for fixed ranges | `collect-tool-keywords-sistrix.php`, PAA, competitor, depth | [DATA_COLLECTION_TOOLS.md](tools/DATA_COLLECTION_TOOLS.md) |
| **Blog posts** | `collect-post-performance-gsc.php`, `collect-post-performance-ga4.php` | Query dimension via post pipeline / GSC UI | Rich SISTRIX stack (see inventory) | [IMPROVEMENT_DATA_COLLECTION_GUIDE.md](blog/IMPROVEMENT_DATA_COLLECTION_GUIDE.md), [DATA_COLLECTION_SCRIPTS_INVENTORY.md](blog/DATA_COLLECTION_SCRIPTS_INVENTORY.md) |
| **Templates** | `collect-template-performance-gsc.php` (pattern) | `--path` + `--output` via tools script or UI | `collect-template-keywords-sistrix.php` etc. | [TEMPLATE_CONTENT_WORKFLOW.md](../systems/templates/TEMPLATE_CONTENT_WORKFLOW.md) |
| **Static / homepage / Tier A** | `collect-static-pages-performance-*.php`, split scripts | Same `collect-tool-gsc-queries.php --path= --output=` | [DATA_COLLECTION_STATIC_SITE.md](pages/static-pages/DATA_COLLECTION_STATIC_SITE.md) | |
| **Product features** | [DATA_COLLECTION_PRODUCT_FEATURES.md](pages/product-pages/DATA_COLLECTION_PRODUCT_FEATURES.md) | `--output` pattern | Portfolio + per-page SISTRIX | |
| **Industry / Branchen** | [DATA_COLLECTION_BRANCHEN.md](pages/industry-pages/DATA_COLLECTION_BRANCHEN.md) | `php v2/scripts/tools/collect-tool-gsc-queries.php --path=/branchen/{slug} --output=docs/content/pages/industry-pages/{slug}/data/gsc-queries.json` (optional `--start` / `--end`); compare with `compare-gsc-query-exports.php` | `collect-page-keywords-sistrix.php --page=...` | `php v2/scripts/marketing-pages/generate-industry-data-synthesis.php --page={id}` → `DATA_DRIVEN_SYNTHESIS.generated.md`; hand-maintained `DATA_DRIVEN_SYNTHESIS.md` + `KEYWORD_DECISION.md`; FAQ SSOT: **Gastronomie** → `{docs_dir}/faq-answers-optimized.json`; **andere Branchen** (z. B. Einzelhandel, Handwerk, Hospitality, Freizeit & Kultur) → `v2/data/industry-faqs/*.json` (z. B. `leisure.json` für `/branchen/freizeit-kultur`) |
| **Downloads / gated** | Often GSC UI until automated; store notes under `docs/content/` | `collect-tool-gsc-queries.php --path=/download/... --output=...` | As needed | [download-pages.mdc](../.cursor/rules/download-pages.mdc) |
| **Comparison pages** | Registry + static performance JSON where applicable | `collect-tool-gsc-queries.php --path=/vergleich/... --output=docs/.../gsc-queries.json` | Competitor + Serper per [COMPARISON_PAGES_GUIDE.md](../guides/comparison-pages/COMPARISON_PAGES_GUIDE.md) | |

**Blog vs tools gap:** Blog collectors include **two static GA/GSC ranges** (e.g. 90d vs year) on posts. Tools can use **`--compare-28d`** on GA4 and **two saved `gsc-queries.json`** files for rolling month-style comparison — see §2.

---

## 2. Period comparison (declining or growing queries / traffic)

**Option A — GSC UI:** Performance → filter exact page → **Compare** to previous period → export or screenshot top query deltas.

**Option B — In-repo (recommended for tickets):**

1. Export queries for **period 1** (e.g. previous 28 days):  
   `php v2/scripts/tools/collect-tool-gsc-queries.php --tool={slug} --start=YYYY-MM-DD --end=YYYY-MM-DD --output=docs/content/tools/{slug}/data/gsc-queries-prev.json`
2. Export **period 2** (e.g. last 28 days) to the default `gsc-queries.json` or a second file.
3. Diff (no API cost):  
   `php v2/scripts/tools/compare-gsc-query-exports.php --before=.../gsc-queries-prev.json --after=.../gsc-queries.json --output=docs/content/tools/{slug}/data/gsc-queries-diff.md`

**JSON shape:** [tools/GSC_QUERY_EXPORT_SCHEMA.md](tools/GSC_QUERY_EXPORT_SCHEMA.md).

**GA4 (tools only):**  
`php v2/scripts/tools/collect-tools-performance-ga4.php --compare-28d`  
writes `previous_period` + `delta` per `/tools/*` path. Omit the flag for the classic single-window file.

---

## 3. Synthesis and editorial anchors

- **Tools:** Regenerate `DATA_DRIVEN_SYNTHESIS.md`:  
  `php v2/scripts/tools/generate-tool-data-synthesis.php --tool={slug} --output=docs/content/tools/{slug}/DATA_DRIVEN_SYNTHESIS.md`  
  Optional: `--gsc-queries-compare=gsc-queries-prev.json` (path relative to project or `data/` filename).
- **SISTRIX URL / domain for tools:** When you need “where does Ordio rank vs SERP for this keyword set?”, see [SISTRIX_URL_AND_DOMAIN_FOR_TOOLS.md](tools/SISTRIX_URL_AND_DOMAIN_FOR_TOOLS.md) (avoid blind use of expensive endpoints).
- **Data utilization gate:** Every new or refreshed `data/*.json` from a collector must be **referenced** in the surface synthesis doc (`DATA_DRIVEN_SYNTHESIS*.md`) and/or `KEYWORD_DECISION.md`, or explicitly **deprecated** in writing in the same PR. Do not accumulate orphan exports. VIP marketing tier (features, Branchen, static Tier A): [VIP_MARKETING_SEO_DATA_TIERS.md](pages/marketing-pages/VIP_MARKETING_SEO_DATA_TIERS.md).

**Industry / static VIP sprint — paste into PR description (budget + commands):**

- Credit tiers + utilization rules: [VIP_MARKETING_SEO_DATA_TIERS.md](pages/marketing-pages/VIP_MARKETING_SEO_DATA_TIERS.md) (§4 typical sprint table, §5 gate).
- Orchestrator: `bash v2/scripts/marketing-pages/run-page-research-pipeline.sh <registry_id> --with-gsc-queries --with-sistrix-serp` (+ `--with-synthesis` when refreshing `DATA_DRIVEN_SYNTHESIS.generated.md`).
- Parity / rollback note: [MARKETING_RESEARCH_STACK_PARITY.md](pages/marketing-pages/MARKETING_RESEARCH_STACK_PARITY.md).

---

## 4. Firecrawl and live page analysis (competitors + own URL)

Use for **markdown capture** of rendered pages (headings, FAQ blocks, length). Prefer **`firecrawl_scrape`** with `formats: ['markdown']` per [.cursor/rules/mcp-usage.mdc](../.cursor/rules/mcp-usage.mdc). **Avoid** expensive extract endpoints unless justified.

- [ ] Scrape **your** live canonical URL once (post-deploy check).
- [ ] Scrape **1–3 top organic competitors** for the same intent (from SISTRIX `keyword.seo` or manual SERP).
- [ ] Map H2/H3 and FAQ gaps into `CONTENT_OUTLINE` / backlog — do not paste competitor copy.

Cross-reference: [FAQ_WEBSITE_STANDARD.md](FAQ_WEBSITE_STANDARD.md) (Firecrawl + PAA for FAQs).

---

## 5. Implementation and validation

- [ ] Edit PHP / JSON / content per page-type rules (tools: `validate-tool-content-completeness.php`, FAQ parity, internal links audit).
- [ ] Rich Results Test after deploy when schema changed.
- [ ] Calendar: **4–8 weeks** re-run Phase 0 metrics for the same URL and attach diff to the ticket (see handoff template).

---

## 6. Related indexes

- [CONTENT_CREATION_DATA_CHECKLIST.md](CONTENT_CREATION_DATA_CHECKLIST.md) — orchestrators and “improvement iteration” row.
- [WEBSITE_PAGE_PUBLICATION_INDEX.md](WEBSITE_PAGE_PUBLICATION_INDEX.md) — new vs improve.
- [docs/ai/cursor-playbook.md](../ai/cursor-playbook.md) — agent routing.
