# Page improvement: data-first playbook (GSC, GA4, SISTRIX, research)

**Last Updated:** 2026-04-04

**Purpose:** Standard workflow for **improving an existing public URL** (tools, blog, templates, downloads, comparison, product, industry, static Tier A, etc.): always ground outlines, copy, meta, and FAQs in **performance data** plus **competitive/SERP research**. Distinct from shipping a **new** URL (no history in GSC/GA).

**Related:** [PAGE_IMPROVEMENT_ITERATION_CHECKLIST.md](PAGE_IMPROVEMENT_ITERATION_CHECKLIST.md) (step-by-step iteration + period compare + Firecrawl), [FAQ_WEBSITE_STANDARD.md](FAQ_WEBSITE_STANDARD.md) (GSC + PAA to FAQs), [KEYWORD_RESEARCH_GUIDE.md](../seo-strategy-2026/guides/KEYWORD_RESEARCH_GUIDE.md), [cursor-playbook.md](../ai/cursor-playbook.md). **Which script for which surface:** [SEO_DATA_COLLECTION_MATRIX.md](SEO_DATA_COLLECTION_MATRIX.md). **Blog body styling (formula-block, tables, notes):** [CONTENT_FORMAT_PATTERNS.md](blog/CONTENT_FORMAT_PATTERNS.md) § *Visual format decision guide*. **SISTRIX API:** [sistrix.com/api](https://www.sistrix.com/api/).

---

## 1. When this applies

Use this playbook when the task is **optimize**, **refresh**, **improve**, **skyscraper**, **SEO sprint**, or **traffic recovery** on a URL that already exists on `www.ordio.com`.

**Exceptions (still document in ticket/PR notes):**

- `v2/config/google-api-credentials.php` missing or GSC API failing: use **GSC UI** exports (see per-surface docs) and paste key numbers into the page’s research folder or `SERP_ANALYSIS.md`.
- Brand-new URL with no data: follow **new page** workflows ([WEBSITE_PAGE_PUBLICATION_INDEX.md](WEBSITE_PAGE_PUBLICATION_INDEX.md), blog new-post pipeline) until the URL has been live long enough to collect baseline metrics.

---

## 2. Phase overview

| Phase | Goal | Typical inputs |
|-------|------|----------------|
| **0 – Baseline** | Know what the URL actually earns (clicks, impressions, CTR, position) and **which queries** drive it; align GA4 on-page behavior where available. | GSC (page + query), GA4 per path |
| **1 – Research stack** | Keywords, PAA, competitors, depth, intent (surface-specific). | SISTRIX, Serper, Firecrawl, manual SERP |
| **2 – Synthesis** | Outline, `CONTENT_OUTLINE` / H2 plan, meta title/description hypotheses, FAQ list, internal links; **planned visual formats** (table vs. formula-block vs. note) per section where gaps exist. | Baseline + research merged |
| **3 – Implement** | PHP/JSON/content changes per page-type rules; schema; validators; **styling pass** for blog HTML: apply `formula-block`, breakout tables, and `blog-note` per [CONTENT_FORMAT_PATTERNS.md](blog/CONTENT_FORMAT_PATTERNS.md) § *Visual format decision guide* (add missing scan value, avoid stacking every pattern). | — |
| **4 – Measure** | After deploy: Rich Results Test where applicable; **4–8 weeks** later re-check GSC/GA for the same URL. | Same metrics as Phase 0 |

```mermaid
flowchart LR
  subgraph p0 [Phase0_Baseline]
    GSC[GSC_page_query]
    GA4[GA4_engagement]
  end
  subgraph p1 [Phase1_Research]
    SX[SISTRIX_PAA]
    SERP[SERP_competitors]
  end
  p0 --> syn[Synthesis_outline_meta_FAQ]
  p1 --> syn
  syn --> impl[Implement_validate]
  impl --> meas[Post_deploy_review]
```

---

## 3. Signals to use (and how)

### 3.1 Google Search Console (page level)

- **Clicks / impressions:** Overall demand and whether the page is visible for its topic cluster.
- **Average position:** Directional; treat as **segment-level**, not a guarantee for every query.
- **CTR:** Low CTR at similar position may indicate weak title/description or SERP features; pair with query report.

**Official reference:** Use [Google Search Console Performance report](https://support.google.com/webmasters/answer/7576553) documentation for dimensions (e.g. queries, pages) and date ranges.

### 3.2 Google Search Console (query level)

- Identify **top click drivers** (protect and reinforce in H1, intro, FAQs).
- Identify **high impressions, low clicks** (title/meta, snippet alignment, or content gap vs intent).
- Identify **queries losing traction** using **two date ranges** (see §5 Period comparison).

### 3.3 GA4

- **Sessions, views, engagement** on the same path validate whether organic clicks translate to on-site usage (tools: calculator usage events if tracked; blog: read depth where available).
- Use the same **path** as in GSC (mind trailing slashes and `/insights/` blog prefixes).
- **Tools (`/tools/*`):** `collect-tools-performance-ga4.php --compare-28d` adds **previous 28 days** + **delta** per path in `tools-performance-ga4.json` (optional; default export remains a single rolling window).

### 3.4 SISTRIX and SERP tools

- **Rankings / keyword metrics / PAA:** Complement GSC (GSC is **your** site; SISTRIX helps with market keywords and gaps).
- **Competitor pages:** Headings, depth, FAQ coverage—feed `competitive-depth` / `SERP_ANALYSIS.md` workflows.
- **Cheap SERP:** Prefer **`keyword.seo`** (top URLs per keyword, low credits) for gap lists and competitor domains.
- **VIP marketing pages** (registry `product`, `industry`, static Tier A): **Selective** **`keyword.domain.seo` + `kw`** (~100 credits/keyword) for **1–5 head terms** per sprint is **in policy** when cheap SERP + GSC are not enough—never blind batch across every keyword. Budget, commands, and **data utilization** (synthesis + `KEYWORD_DECISION.md`): [VIP_MARKETING_SEO_DATA_TIERS.md](pages/marketing-pages/VIP_MARKETING_SEO_DATA_TIERS.md).

---

## 4. Page family → data collection (canonical docs)

Do **not** duplicate command tables here. Open the **DATA_COLLECTION** doc for the surface you are editing:

| Surface | Baseline & scripts | Notes |
|---------|-------------------|--------|
| **Tools / Rechner** | [DATA_COLLECTION_TOOLS.md](tools/DATA_COLLECTION_TOOLS.md), [TOOLS_PERFORMANCE_DATA.md](tools/TOOLS_PERFORMANCE_DATA.md) | Global: `collect-tools-performance-gsc.php` / `ga4.php`. Per-URL queries: `collect-tool-gsc-queries.php` → `docs/content/tools/{slug}/data/gsc-queries.json`. Arbitrary output path: `--output=` (same script). **After exports:** `generate-tool-data-synthesis.php` → `docs/content/tools/{slug}/DATA_DRIVEN_SYNTHESIS.md` (single anchor for Phase 2 outline + Phase 4 compare). |
| **Blog posts** | [DATA_COLLECTION_SCRIPTS_INVENTORY.md](blog/DATA_COLLECTION_SCRIPTS_INVENTORY.md), [IMPROVEMENT_DATA_COLLECTION_GUIDE.md](blog/IMPROVEMENT_DATA_COLLECTION_GUIDE.md) | `collect-post-performance-gsc.php`, `collect-post-performance-ga4.php`; improvement pipeline bundles Phase 1. |
| **Templates** | [TEMPLATE_CONTENT_WORKFLOW.md](../systems/templates/TEMPLATE_CONTENT_WORKFLOW.md) | `collect-template-performance-gsc.php`, per-template `data/performance-gsc.json` where applicable. |
| **Static / Tier A** | [DATA_COLLECTION_STATIC_SITE.md](pages/static-pages/DATA_COLLECTION_STATIC_SITE.md) | Global + `split-static-gsc-to-registry-pages.php` → per-page `performance-gsc.json`. |
| **Product features** | [DATA_COLLECTION_PRODUCT_FEATURES.md](pages/product-pages/DATA_COLLECTION_PRODUCT_FEATURES.md) | Split + merge patterns like static. **Query-level GSC (FAQ mining):** `php v2/scripts/tools/collect-tool-gsc-queries.php --path=/schichtplan --output=docs/content/pages/product-pages/schichtplan/data/gsc-queries.json` (swap path + output per feature). Optional VIP domain-kw SERP: [VIP_MARKETING_SEO_DATA_TIERS.md](pages/marketing-pages/VIP_MARKETING_SEO_DATA_TIERS.md). |
| **Branchen / industry** | [DATA_COLLECTION_BRANCHEN.md](pages/industry-pages/DATA_COLLECTION_BRANCHEN.md) | Registry + split/merge; **query-level GSC:** `collect-tool-gsc-queries.php --path=/branchen/{slug} --output={docs_dir}/data/gsc-queries.json`; optional period compare + `generate-industry-data-synthesis.php --page={id}` → metrics anchor (see §3.2). FAQ JSON lives under **`{docs_dir}`** (e.g. `gastronomie/faq-answers-optimized.json`), not draft `*-neu/` folders. VIP SISTRIX: [VIP_MARKETING_SEO_DATA_TIERS.md](pages/marketing-pages/VIP_MARKETING_SEO_DATA_TIERS.md). |
| **Downloads** | [download-pages.mdc](../../.cursor/rules/download-pages.mdc), gated flows | GSC Performance → page equals; store `gsc-queries.json` via `collect-tool-gsc-queries.php --path=/… --output=docs/.../gsc-queries.json` until a dedicated split exists. |

**Slug / path traps:** Tools research folders are often **hyphenated** while public URLs may differ (e.g. `elterngeld-rechner` vs `/tools/elterngeldrechner`). Always use [TOOLS_SLUG_MATRIX.md](tools/TOOLS_SLUG_MATRIX.md) for tools.

---

## 5. Period comparison (trends and “losing” queries)

**Operational checklist:** [PAGE_IMPROVEMENT_ITERATION_CHECKLIST.md](PAGE_IMPROVEMENT_ITERATION_CHECKLIST.md) §2.

**Option A — GSC UI (no code):**

1. In GSC → **Performance**, set **Date range** to period A (e.g. last 3 months) and note top queries + totals for the filtered page.
2. Switch to **Compare** tab and choose a **previous period** of equal length (e.g. preceding 3 months) or year-over-year if seasonality matters.
3. Export or screenshot **Queries** with largest **negative** click or impression deltas for the same page filter.

**Option B — In-repo (tools and any path supported by the script):**

1. Run `collect-tool-gsc-queries.php` twice with **`--start` / `--end`** (fixed ranges), saving two JSON files — e.g. `gsc-queries-prev.json` and `gsc-queries.json` (see [GSC_QUERY_EXPORT_SCHEMA.md](tools/GSC_QUERY_EXPORT_SCHEMA.md)).
2. Diff without extra API quota:  
   `php v2/scripts/tools/compare-gsc-query-exports.php --before=… --after=… [--output=docs/.../gsc-queries-diff.md]`
3. Optionally fold into `DATA_DRIVEN_SYNTHESIS.md`:  
   `generate-tool-data-synthesis.php --tool={slug} --gsc-queries-compare=gsc-queries-prev.json`

**Rolling 28 vs previous 28 (example):** Align `--start`/`--end` to two consecutive 28-day windows; pair with `collect-tools-performance-ga4.php --compare-28d` for on-site engagement trends on `/tools/*`.

---

## 6. Mapping data to SEO / AEO / GEO

| Finding | Typical actions |
|---------|-----------------|
| Top queries by clicks | Ensure visible answer in intro or dedicated H2/H3; match wording naturally; strengthen internal links from cluster pages. |
| High impressions, low CTR | Test title/meta variants; check whether SERP shows FAQs, sitelinks, or competitors’ richer snippets. |
| Position drift down on money queries | Re-run competitor depth; add blocks/FAQ/schema where appropriate; verify technical/schema. |
| GA4: high bounce / low engagement | UX clarity, above-the-fold value, speed (LCP); align with [FAQ_WEBSITE_STANDARD.md](FAQ_WEBSITE_STANDARD.md). |
| PAA / SISTRIX questions not on page | Add FAQ entries or H2s; keep du tone and one Ordio mention per major section ([shared-patterns.mdc](../../.cursor/rules/shared-patterns.mdc)). |

---

## 7. Cannibalization

Before targeting a new primary keyword on an **existing** page, check whether another Ordio URL already ranks for it (GSC + site search + [CANONICAL_URLS_AND_LINKING.md](../development/CANONICAL_URLS_AND_LINKING.md)). Document primary ownership in the relevant `KEYWORD_DECISION.md` or outline notes.

---

## 8. Post-deploy checklist

- [ ] **Rich Results Test** on the live URL when FAQ or main entity schema changed.
- [ ] Note **deploy date** in changelog or `SERP_ANALYSIS.md` if you track recovery.
- [ ] **4–8 weeks:** Re-run baseline collectors or GSC compare for the same URL; adjust copy/meta only when data supports it.

**Blog: `validate-content-flow.php` false positive (FAQ-in-body):** The FAQ check uses a regex that can match `\bHäufig\b … Fragen` case-insensitively across a long stretch before a closing `</h2>`/`</h3>`. Wording such as „häufig …“ in the intro and „… Fragen“ kilobytes later can trigger it. **Mitigation:** prefer „regelmäßig“, „oft“, or keep „Fragen“ in phrasing that does not pair with „häufig“ in the same matched window (e.g. „Fragen zu Zuschlägen“ without „häufig“ immediately before a substring ending in „fragen“).

---

## 9. Query-level GSC without a tools research folder

For **comparison pages**, one-off marketing URLs, or any path where you want `gsc-queries.json` under a custom docs path:

```bash
php v2/scripts/tools/collect-tool-gsc-queries.php \
  --path=/dein/pfad/ohne-domain \
  --output=docs/content/your-folder/data/gsc-queries.json \
  [--days=90]

# Fixed calendar range (period A vs B exports):
php v2/scripts/tools/collect-tool-gsc-queries.php \
  --path=/dein/pfad/ohne-domain \
  --output=docs/content/your-folder/data/gsc-queries-period.json \
  --start=2026-01-01 --end=2026-01-28
```

Requires `v2/config/google-api-credentials.php` and Composer. Parent directories for `--output` are created if missing. See [DATA_COLLECTION_TOOLS.md](tools/DATA_COLLECTION_TOOLS.md).

---

## 10. Related indexes

- [CONTENT_SYSTEM_INDEX.md](blog/CONTENT_SYSTEM_INDEX.md) — blog workflows and data collection map.
- [WEBSITE_PAGE_PUBLICATION_INDEX.md](WEBSITE_PAGE_PUBLICATION_INDEX.md) — non-blog page families and checklists.
