# Firecrawl MCP Remediation

**Last Updated:** 2026-03-17  
**Purpose:** When PHP Firecrawl API fails or returns sparse data, use Firecrawl MCP to scrape competitors and update competitor-analysis.json. **Always save the FULL markdown** – not headings-only – or word counts will be wrong.

## When to Use

- `validate-blog-competitor-data-completeness.php --remediate` fails with "Firecrawl API returned no data"
- Competitor has `word_count < 200` (suspicious for lexikon; typical range 800–3000)
- `analyze-competitor-content-depth.php` reports "Data quality warning: top competitor(s) have <200 words"

## Root Cause: Headings-Only Saves

**Problem:** Saving only headings (or a summary) instead of the full Firecrawl markdown produces wrong word counts. Example: [GFOS Pausenregelung](https://www.gfos.com/de/lexikon/pausenregelung/) has ~1,400 words but was stored as 40 when only headings were saved.

**Fix:** Always save the complete `markdown` string from the Firecrawl MCP response.

## Workflow

### 1. Scrape via Firecrawl MCP

```
firecrawl_scrape
  url: https://example.com/competitor-page
  formats: ["markdown"]
  onlyMainContent: true
```

### 2. Save FULL Markdown to File

Create `tmp/{post}-scrapes/{domain}.md` and write the **entire** `markdown` value from the response. Do not extract only headings.

```bash
mkdir -p tmp/pausenregelung-scrapes
# Write full markdown to tmp/pausenregelung-scrapes/gfos.com.md
```

### 3. Update competitor-analysis.json

```bash
php v2/scripts/blog/update-competitor-analysis-from-markdown.php \
  --post=pausenregelung \
  --category=lexikon \
  --scrapes-dir=tmp/pausenregelung-scrapes
```

### 4. Validate

- If any competitor shows `word_count < 200` for lexikon, the script will warn: "SUSPICIOUS: <200 words – save FULL markdown"
- Re-run `analyze-competitor-content-depth.php` to regenerate competitive-depth-analysis.md

## Sparse Threshold

- **validate-blog-competitor-data-completeness.php:** Flags competitors with `word_count < 200` or `headings < 3` as sparse (remediation required)
- **Lexikon typical range:** 800–3,000 words per competitor page
- **Ratgeber:** Similar; <200 is almost always incomplete

## References

- [SISTRIX_FAILURE_FALLBACKS.md](SISTRIX_FAILURE_FALLBACKS.md) – When SISTRIX returns 0 competitors
- [docs/systems/firecrawl/FIRECRAWL_INTEGRATION.md](../../systems/firecrawl/FIRECRAWL_INTEGRATION.md) – Firecrawl API setup
