# Template Data Collection Guide

**Last Updated:** 2026-03-18

Scripts, inputs, and outputs for template page data collection.

## SISTRIX is Mandatory

**SISTRIX collection is required** for all template content creation. The `run-new-template-pipeline.php` no longer supports `--skip-sistrix`. If `keywords-sistrix.json` has `credits_used: 0` or empty data, re-run without skipping:

```bash
php v2/scripts/templates/collect-template-keywords-sistrix.php --template={id} --template-priority
```

## SISTRIX Credit Policy for Templates

**Templates are published less frequently than blog posts.** Use SISTRIX credits freely for template improvement:

- **Run full collection without `--limit`** when improving a template
- **Use `--template-priority`** with `collect-template-keywords-sistrix.php` to bypass daily credit limit
- Pipeline: run `collect-template-keywords-sistrix.php --template=ID --template-priority` for templates being improved

```bash
php v2/scripts/templates/collect-template-keywords-sistrix.php --template=dienstplan-excel-vorlage --template-priority
```

## Scripts

### Keyword Discovery

| Script | Input | Output | Credits |
|--------|-------|--------|---------|
| discover-template-keywords-sistrix.php | template-seed-keywords.json | `template-keywords-discovered.json` | 1/idea |
| generate-template-candidate-keywords.php | registry, merged (base) + discovered (augment) | `template-candidate-keywords.json` | 0 |
| collect-template-keywords-sistrix.php | template-candidate-keywords.json | `template-data/{id}/data/keywords-sistrix.json` | 5/keyword |

### Template Analysis

| Script | Input | Output | Credits |
|--------|-------|--------|---------|
| analyze-template-keyword-overlap.php | template-candidate-keywords.json | `template-merge-recommendations.json` | 0 |
| analyze-template-remove-candidates.php | registry, keywords-sistrix | `template-remove-recommendations.json` | 0 |
| generate-template-priority-from-sistrix.php | keywords-sistrix | `TEMPLATE_PRIORITY_LIST_GENERATED.md` | 0 |

### Content & SEO

| Script | Input | Output | Credits |
|--------|-------|--------|---------|
| collect-template-paa-questions.php | template-candidate-keywords, keywords-sistrix | `template-data/{id}/data/paa-questions.json` | ~15/keyword |
| collect-template-competitor-analysis.php | primary keyword | `template-data/{id}/data/competitor-analysis.json` | 1/keyword (Firecrawl Extract when cURL sparse) |
| analyze-template-competitor-depth.php | competitor-analysis.json | `template-data/{id}/data/competitive-depth-analysis.md` | 0 |
| collect-template-performance-gsc.php | template URL | `template-data/{id}/data/performance-gsc.json` | 0 |
| collect-template-faq-research-data.php | paa, keywords, gsc | `template-data/{id}/data/faq-research.json` | 0 |

**Content targets:** competitive-depth-analysis.md → validate, audit, generate-outline. See [CONTENT_TARGETS_REFERENCE.md](CONTENT_TARGETS_REFERENCE.md).

**Firecrawl Extract:** When cURL returns sparse competitor content (word_count < 100, headings < 3), `collect-template-competitor-analysis.php` uses Firecrawl Extract API for structured data. See [FIRECRAWL_INTEGRATION.md](../firecrawl/FIRECRAWL_INTEGRATION.md). Run `validate-template-competitor-data-completeness.php --remediate` (Phase 1.5) to auto-fix sparse via Firecrawl API, or use Firecrawl MCP. For bulk remediation: `audit-firecrawl-sparse-competitors.php --output-urls=sparse.json` then `firecrawl-batch-remediate-sparse.php --input=sparse.json`. Use `--use-firecrawl-search` on collect script for Firecrawl Search (DE geo) supplement.

**Keyword workflow:** See [TEMPLATE_INVENTORY_KEYWORD_WORKFLOW.md](TEMPLATE_INVENTORY_KEYWORD_WORKFLOW.md).

## Options

- `--template=id` – single template
- `--all` – all templates in candidate keywords
- `--dry-run` – no API calls (where supported)
- `--template-priority` – (collect-template-keywords-sistrix.php) bypass daily credit limit; use credits freely for templates

## Credit Log

- Shared with blog: `v2/data/blog/sistrix-credits-log.json`
- Cache: `v2/data/blog/sistrix-cache/`

## Troubleshooting

**keywords-sistrix.json has credits_used: 0:** Data may be stub. Re-run `collect-template-keywords-sistrix.php` without `--skip-sistrix` (pipeline no longer supports it). Use `--template-priority` to bypass daily limits.

**Keyword data mismatch:** If `keywords-sistrix.json` or `seo-meta.json` targets the wrong template (e.g. slug `gefaehrdungsbeurteilung-vorlage` but primary "arbeitszeugnis vorlage"), fix `template-candidate-keywords.json` first. Set `primary` to the keyword matching the template slug, then re-run `collect-template-keywords-sistrix.php` and `generate-template-seo-meta.php`.

**FAQ answers empty:** If `generate-template-faq-answers-optimized.php` produces empty answers (no GEMINI_API_KEY), add answers manually to `template-data/{id}/data/faq-answers-optimized.json`. Each answer: 40–80 words, per templates-pages-faq.mdc. Then run `add-template-faqs.php`.

**Content validation fails (min/max contradictory):** When `competitive-depth-analysis.md` has conflicting targets (e.g. min 1000, max 500), add `template-data/{id}/content-targets.json` to override. See [CONTENT_TARGETS_REFERENCE.md](CONTENT_TARGETS_REFERENCE.md).

## Template Candidate Keywords

Path: `docs/systems/templates/template-candidate-keywords.json`

```json
{
  "templates": {
    "dienstplan-excel-vorlage": {
      "primary": "dienstplan vorlage excel",
      "secondary": ["dienstplan vorlage", "dienstplan excel", ...]
    }
  }
}
```
