# SISTRIX Failure Fallbacks

**Last Updated:** 2026-03-17  
**Purpose:** Fallback procedures when SISTRIX returns 0 competitors or off-topic PAA. Use before creating outline.

**See also:** [KEYWORD_RESEARCH_WORKFLOW.md](KEYWORD_RESEARCH_WORKFLOW.md) (keyword selection order, credits, when to use `--from-sistrix` / `--primary-only`).

## Overview

SISTRIX competitor analysis and PAA can fail for Zeiterfassung/ArbZG topics. When `collect-post-competitor-analysis.php` returns 0 URLs or PAA is off-topic, use these fallbacks **before** creating CONTENT_OUTLINE.md.

**Known problematic keywords** (SISTRIX often returns 0 or sparse data): pausenregelung, höchstarbeitszeit, rahmenarbeitszeit, jahresarbeitszeit. For these, **always** run Serper + Firecrawl (or manual scrape) before creating outline.

---

## Competitor Analysis = 0 URLs or Empty Analysis

**When:** `competitor-analysis.json` is empty, has 0 competitor URLs, or all competitors have empty `analysis: {}` (no word_count, no headings).

**Firecrawl remediation is mandatory** – targets set without competitor data are guesses and will underperform. Do not create outline until you have real competitor word counts and H2 counts.

**Actions (in order):**

1. **Serper MCP** – Run Serper MCP for primary keyword (German words, e.g. `Rahmenarbeitszeit` not `rahmenarbeitszeit`). Extract top 10 organic URLs. Manually add to `competitor-analysis.json` if needed.
2. **Firecrawl remediation** – Run:
   ```bash
   php v2/scripts/blog/validate-blog-competitor-data-completeness.php --post={slug} --category={category} --top=5 --remediate
   ```
   This scrapes top 5 competitors via Firecrawl (1 credit/URL) and populates `analysis` (word_count, headings).
3. **If PHP Firecrawl API fails** – Use Firecrawl MCP: scrape each URL, save the **full markdown** (not headings-only) to `tmp/{post}-scrapes/{domain}.md`, then run `update-competitor-analysis-from-markdown.php`. See [FIRECRAWL_MCP_REMEDIATION.md](FIRECRAWL_MCP_REMEDIATION.md).
4. **If Firecrawl unavailable** – Manually scrape 3–5 competitor URLs via Fetch MCP. Extract word count and H2 headings. Populate `competitive-depth-analysis.md` with:
   - Competitor word counts (avg, min, max)
   - Recommended target = competitor_avg × 1.2–1.3 (or 2,000–2,500 for lexikon when no data)
   - Competitor H2 counts; recommended H2 count = max(top 3)
   - Minimum target (80% of recommended)
5. **Run analyze-competitor-content-depth.php** – After Firecrawl populates data, re-run to generate data-driven competitive-depth-analysis.md.

**Fallback word count targets** (when Firecrawl unavailable and manual scrape not feasible):
- **Lexikon:** 2,000–2,500 words (not 1,800). Use 2,500 for medium-high competition.
- **Rationale:** Per FLEXIBLE_WORD_COUNT_GUIDELINES, medium competition lexikon = 1,800–2,500 optimal. Aim high when data is missing.

---

## Competitors Have URLs but Empty analysis

**When:** `competitor-analysis.json` has URLs from Serper/manual fallback but all entries have `analysis: {}`.

**Same as 0 URLs** – Run Firecrawl remediation. Without scraped data, word count and H2 targets are guesses. `analyze-competitor-content-depth.php` will output a fallback report; use 2,000–2,500 for lexikon until Firecrawl populates data.

---

## PAA Off-Topic or Empty

**When:** SISTRIX PAA returns >30% off-topic or empty for primary keyword.

**Actions:**

1. **Serper MCP** – Run Serper MCP for primary keyword. Extract PAA questions from real Google SERP.
2. **Create paa-questions-manual.json** – Add 12–15 HR-focused questions to `data/paa-questions-manual.json`.
3. **Run collect-faq-research-data.php** – Regenerate FAQ research from manual PAA.

**Details:** See [PAA_QUALITY_AND_MANUAL_OVERRIDE.md](PAA_QUALITY_AND_MANUAL_OVERRIDE.md) for format and curation guidelines.

---

## Checklist Reference

- [LEXIKON_NEW_POST_CHECKLIST.md](posts/lexikon-inventory/LEXIKON_NEW_POST_CHECKLIST.md) – SISTRIX Failure Protocol
- [LEXIKON_POST_CREATION_TODO_TEMPLATE.md](posts/_templates/LEXIKON_POST_CREATION_TODO_TEMPLATE.md) – Step-by-step with DO NOT SKIP items
