# Tools / Rechner — data collection and reports

**Last Updated:** 2026-04-08

Canonical inventory for SISTRIX, GSC, and per-tool research artifacts. Mirrors the blog workflow in [DATA_COLLECTION_SCRIPTS_INVENTORY.md](../blog/DATA_COLLECTION_SCRIPTS_INVENTORY.md) but scoped to `docs/content/tools/` and `v2/scripts/tools/`.

## Security: SISTRIX API key

- Store the key only in `SISTRIX_API_KEY` (environment) or `v2/config/sistrix-api-key.php` (gitignored; see [.gitignore](../../.gitignore)).
- **Never** commit API keys or paste them into docs, screenshots, or chat. If a key is exposed, rotate it in the SISTRIX dashboard immediately.

## Credit accounting

- Shared log: `v2/data/blog/sistrix-credits-log.json` (weekly reset; see [v2/helpers/sistrix-credit-log.php](../../v2/helpers/sistrix-credit-log.php)).
- **Sprint note (2026-03-29):** Full refresh of global candidates + per-tool `keyword.seo.metrics`, PAA (`keyword.questions`), and competitor (`keyword.seo`) for all live tools. Check `total_used` in the credit log after runs; plan batches if approaching the weekly cap.

## Script → output → when to refresh

| Step | Script | SISTRIX endpoint (typical) | Output | Refresh cadence |
|------|--------|---------------------------|--------|------------------|
| Global keywords | `v2/scripts/tools/collect-tools-keywords-sistrix.php` | `keyword.seo.metrics` (~5 cr/kw) | `docs/content/tools/tools-keyword-sistrix.json` | Quarterly or when `tools-candidate-keywords.json` changes |
| Merge table | `v2/scripts/tools/merge-tools-opportunity-data.php` | — | stdout (paste to `TOOLS_OPPORTUNITY_LIST.md`) | After SISTRIX and/or GSC refresh |
| GSC API (live) | `v2/scripts/tools/collect-tools-performance-gsc.php` | GSC Search Analytics | `docs/content/tools/tools-performance-gsc.json` | Weekly/monthly; `--days=N` (default 90) |
| GSC queries (one page) | **`v2/scripts/seo/collect-gsc-queries.php`** (canonical) — delegates to `collect-tool-gsc-queries.php`: `--tool={slug}` **or** `--path=...` · optional `--start=YYYY-MM-DD --end=YYYY-MM-DD` (fixed range; ignores `--days`) | GSC Search Analytics, `dimension=query` + page `equals` | Default: `docs/content/tools/{slug}/data/gsc-queries.json`; custom path via `--output` (dirs created) | When diagnosing traffic/position vs CTR for a single URL (tools, or any path e.g. comparison pages) |
| GSC query diff (two JSON files) | `v2/scripts/tools/compare-gsc-query-exports.php --before=... --after=...` | — | stdout or `--output=*.md` | After two exports (e.g. previous 28d vs last 28d); no API calls |
| GSC CSV → JSON | `v2/scripts/tools/gsc-export-to-json.php` | — | `docs/content/tools/tools-performance-gsc.json` | Fallback if API unavailable |
| GA4 API | `collect-tools-performance-ga4.php` · optional `--compare-28d` (current vs previous 28 days per path + `delta`) | GA4 Data API | `docs/content/tools/tools-performance-ga4.json` | Default: single rolling `--days` (90). Use `--compare-28d` when you need period-over-period engagement on `/tools/*` |
| Per-tool keywords | `v2/scripts/tools/collect-tool-keywords-sistrix.php --tool={slug}` | `keyword.seo.metrics` | `docs/content/tools/{slug}/data/keywords-sistrix.json` | Quarterly per high-value tool; after editing `keywords-candidate.json` |
| Per-tool PAA | `v2/scripts/tools/collect-tool-paa-questions.php --tool={slug}` | `keyword.questions` | `docs/content/tools/{slug}/data/paa-questions.json` | With keywords refresh |
| Competitors | `v2/scripts/tools/collect-tool-competitor-analysis.php --tool={slug}` | `keyword.seo` + HTTP/Firecrawl | `docs/content/tools/{slug}/data/competitor-analysis.json` | With PAA; before content sprints |
| Depth | `v2/scripts/tools/analyze-tool-competitor-depth.php --tool={slug}` | — | `docs/content/tools/{slug}/data/competitive-depth-analysis.md` | After competitor JSON |
| SERP skeleton | `v2/scripts/tools/generate-tool-serp-skeleton.php --tool={slug}` | — | `docs/content/tools/{slug}/SERP_ANALYSIS.md` | After depth; use `--force` only when regenerating |
| Outline | `v2/scripts/tools/generate-tool-content-outline.php --tool={slug}` | — | `docs/content/tools/{slug}/CONTENT_OUTLINE.md` | Same as SERP skeleton |
| **One-page metrics synthesis** | `generate-tool-data-synthesis.php --tool={slug} --output=...` · optional `--gsc-queries-compare=older.json` | Reads committed JSON only | `DATA_DRIVEN_SYNTHESIS.md` | After refreshing aggregates + queries; pass a **saved** earlier `gsc-queries.json` for a PoP snippet |

**Orchestrator:** `v2/scripts/tools/run-tools-improvement-pipeline.php --tool={slug} [--phase=1-4]` — see [TOOLS_CONTENT_WORKFLOW.md](../../guides/tools-pages/TOOLS_CONTENT_WORKFLOW.md).

**FAQ JSON-LD / drift:** `php v2/scripts/tools/audit-tools-faq-schema-status.php` (render-faq + head `FAQPage` hits + `ordio_echo_tools_faq_jsonld_script`). Parity: `php v2/scripts/dev-helpers/verify-faq-jsonld-parity.php --all-tools`. Context smoke test: `php v2/scripts/dev-helpers/audit-faq-jsonld-context.php`. See [TOOLS_CONTENT_WORKFLOW.md](../../guides/tools-pages/TOOLS_CONTENT_WORKFLOW.md) § FAQ JSON-LD. **Data-driven FAQ rework checklist:** [_templates/TOOL_FAQ_REFRESH_CHECKLIST.md](_templates/TOOL_FAQ_REFRESH_CHECKLIST.md).

**Dry-runs:** Pass `--dry-run` to collectors that support it to estimate credits before live calls.

## Per-tool synthesis (DATA_DRIVEN_SYNTHESIS.md)

**Purpose:** Ground outlines, meta hypotheses, and FAQ wording in **numbers from the repo** (not prose-only “research”). One Markdown file merges SISTRIX demand, GSC page totals, GA4 engagement, and top GSC queries for the exact URL.

**Prerequisites (refresh first, then generate):**

1. `php v2/scripts/tools/collect-tools-performance-gsc.php`
2. `php v2/scripts/tools/collect-tools-performance-ga4.php`
3. `php v2/scripts/tools/collect-tool-gsc-queries.php --tool={slug}`
4. Optional: `php v2/scripts/tools/collect-tool-keywords-sistrix.php --tool={slug}` (credits)

**Generate:**

```bash
php v2/scripts/tools/generate-tool-data-synthesis.php \
  --tool=prozentrechner \
  --output=docs/content/tools/prozentrechner/DATA_DRIVEN_SYNTHESIS.md
```

**Iterate:** Commit the regenerated file; on the next sprint, diff against the previous version (git) to see click/impression/query shifts. See [PAGE_IMPROVEMENT_DATA_PLAYBOOK.md](../PAGE_IMPROVEMENT_DATA_PLAYBOOK.md) Phase 2 (Synthesis) and Phase 4 (Measure).

---

## GSC and GA4

- **Recommended (live):**  
  - `php v2/scripts/tools/collect-tools-performance-gsc.php` — one Search Analytics query (page dimension, filter **contains** `/tools/`), writes `tools-performance-gsc.json`.  
  - `php v2/scripts/tools/collect-tools-performance-ga4.php` — GA4 `pagePath` **contains** `/tools/`, metrics screenPageViews, sessions, avg session duration; writes `tools-performance-ga4.json`.  
  Requires the same `v2/config/google-api-credentials.php` as blog/template collectors.
- **Query-level GSC (diagnosis):** For **per-page query lists** in-repo, run `php v2/scripts/tools/collect-tool-gsc-queries.php --tool={slug} [--days=90]` → writes `docs/content/tools/{slug}/data/gsc-queries.json`. Use **`--start` and `--end`** (together) for a fixed calendar range (e.g. export **previous 28 days**, then export **last 28 days** to another file). For **non-tool URLs** (e.g. comparison), use `--path=/ordio/path` with `--output=docs/.../gsc-queries.json` so no tool research folder is required. The page-level collector above remains **aggregate** only. You can still use the **GSC UI** (Performance → filter by exact page URL) for ad-hoc checks or CSV export.
- **Period-over-period (queries):** Save two JSON exports, then `php v2/scripts/tools/compare-gsc-query-exports.php --before=first.json --after=second.json --output=docs/.../gsc-queries-diff.md`. Schema: [GSC_QUERY_EXPORT_SCHEMA.md](GSC_QUERY_EXPORT_SCHEMA.md).
- **GA4 period-over-period (tools):** `php v2/scripts/tools/collect-tools-performance-ga4.php --compare-28d` rewrites `tools-performance-ga4.json` with `compare_28d`, `previous_date_range`, and per-path `previous_period` + `delta`. Default **without** the flag keeps the previous flat format (rolling `--days`).
- **Fallback:** Manual GSC export (CSV) + `gsc-export-to-json.php` if API access is unavailable. Legacy template: `gsc-pages-tools-export.csv`.

## Slug matrix (index URL vs research folder)

Research folders use **hyphenated** slugs under `docs/content/tools/{slug}/`. The public URL in `v2/data/tools_index_data.php` sometimes differs — use this mapping for GSC paths and `--tool=`:

| Public URL slug (`/tools/...`) | Research folder `--tool=` | Notes |
|--------------------------------|----------------------------|--------|
| `elterngeldrechner` | `elterngeld-rechner` | Scripts require hyphenated folder name |
| *(all others)* | Same as URL slug | e.g. `minijob-rechner`, `prozentrechner` |

Full matrix: [TOOLS_SLUG_MATRIX.md](TOOLS_SLUG_MATRIX.md).

## Blog parity gap (optional future work)

Blog posts can run SERP features, search intent, competition levels, FAQ-research merges, etc. (see [SISTRIX_ENDPOINTS_AND_REPORTS.md](../blog/SISTRIX_ENDPOINTS_AND_REPORTS.md)). The tools stack currently uses metrics + PAA + competitor scrape + depth + SERP/outline generators. **Not implemented for tools:** `keyword.seo.serpfeatures`, `keyword.seo.searchintent`, dedicated `faq-research.json` merge — add only if stakeholders justify extra credits and maintenance.

## Related

- [GSC_QUERY_EXPORT_SCHEMA.md](GSC_QUERY_EXPORT_SCHEMA.md)  
- [SISTRIX_URL_AND_DOMAIN_FOR_TOOLS.md](SISTRIX_URL_AND_DOMAIN_FOR_TOOLS.md)  
- [TOOLS_PERFORMANCE_DATA.md](TOOLS_PERFORMANCE_DATA.md)  
- [TOOLS_OPPORTUNITY_LIST.md](TOOLS_OPPORTUNITY_LIST.md)  
- [TOOLS_SEO_IMPROVEMENT_BACKLOG.md](TOOLS_SEO_IMPROVEMENT_BACKLOG.md)  
- [PAGE_IMPROVEMENT_DATA_PLAYBOOK.md](../PAGE_IMPROVEMENT_DATA_PLAYBOOK.md) (cross-surface baseline for improving live URLs)  
- [PAGE_IMPROVEMENT_ITERATION_CHECKLIST.md](../PAGE_IMPROVEMENT_ITERATION_CHECKLIST.md)  
- [.cursor/rules/tools-prioritization.mdc](../../.cursor/rules/tools-prioritization.mdc)
