# Firecrawl Integration

**Last Updated:** 2026-04-02

Firecrawl API integration for competitor content extraction when cURL returns sparse content (JS-heavy sites).
**3k+ plan:** See [FIRECRAWL_3K_PLAN_OPTIMIZATION.md](FIRECRAWL_3K_PLAN_OPTIMIZATION.md) for defaults and monthly batch remediation.
Also used via MCP for manual SERP and competitor research during content creation.

## Overview

| Layer | Purpose |
|-------|---------|
| **Production API fallback** | `v2/helpers/firecrawl-remediate.php` – Scrape-first (1 credit/URL) when cURL returns word_count &lt; 100 or empty headings |
| **Scrape API** | `v2/helpers/firecrawl-fallback.php` – Markdown extraction; parsed for headings, FAQs, word count |
| **Extract API** | `v2/helpers/firecrawl-extract.php` – Optional fallback (22–32 credits); only when `use_extract_fallback` enabled |
| **MCP usage** | Use **firecrawl_scrape** (markdown) for competitor content; avoid firecrawl_extract |

### Scrape-First (Credit Optimization)

- **Scrape:** 1 credit per URL. Returns markdown; we parse headings/FAQs. **Tried first** when cURL returns sparse content.
- **Extract:** ~1–32 credits (token-based). Only used when Scrape fails **and** `use_extract_fallback` is true (default: false).
- **URL skip patterns:** URLs matching `url_skip_patterns` (e.g. images, PDFs) are skipped to avoid wasting credits.

## Configuration

### API Key

- **Environment:** `FIRECRAWL_API_KEY`
- **Local file:** `v2/config/firecrawl-api-key.php` (copy from example, gitignored)

```php
// v2/config/firecrawl-api-key.php (example)
return [
    'api_key' => 'fc-xxxxxxxx',
    'max_age_ms' => 172800000,
    'use_extract_fallback' => false,
    'url_skip_patterns' => ['images\.', '\.pdf$', 'universal-search-box'],
];
```

**Credit optimization options:**
- `use_extract_fallback` (default: false) – Only try Extract when Scrape fails. Set true for JS-heavy edge cases.
- `url_skip_patterns` – Skip URLs matching these regex patterns (saves credits on known-bad URLs).

### Setup

1. Get API key from [Firecrawl](https://firecrawl.dev)
2. Add to `.env` or create `v2/config/firecrawl-api-key.php`
3. For MCP: Add to `mcp.json` via `setup-mcp-config.py --env-file .env` — **do not commit real keys**; keep placeholders in `.cursor/mcp.json.example` and load secrets from `.env`.

### Security

- Never commit `firecrawl-api-key.php`, real `mcp.json` tokens, or API keys in docs or chat logs.
- If a key was pasted into a tracked or shared file, **rotate it** in the Firecrawl dashboard and update `.env` / local config only.

## Production Scripts Using Firecrawl

| Script | Trigger | Cost |
|--------|---------|------|
| `collect-post-competitor-analysis.php` | word_count &lt; 100 or empty headings | ~1 credit per failed competitor |
| `collect-tool-competitor-analysis.php` | Same | Same |
| `collect-template-competitor-analysis.php` | Same | Same |

**Typical usage:** Default config (3k plan): proactive top 3 + remediate top 7 = 5–12 credits/post. See [FIRECRAWL_3K_PLAN_OPTIMIZATION.md](FIRECRAWL_3K_PLAN_OPTIMIZATION.md).

## API Reference

- **Base URL:** `https://api.firecrawl.dev` (paths are versioned; see [Firecrawl API introduction](https://docs.firecrawl.dev/api-reference/introduction) and [v2 OpenAPI](https://docs.firecrawl.dev/api-reference/v2-openapi.json)).
- **Scrape (production):** `POST …/v2/scrape` — configured as `scrape_path` in `v2/config/firecrawl-config.php` (override via `FIRECRAWL_SCRAPE_PATH` or `firecrawl-api-key.php`).
- **Batch scrape:** `POST …/v2/batch/scrape` — `batch_scrape_path` / `FIRECRAWL_BATCH_SCRAPE_PATH`.
- **Map:** `POST …/v2/map` — `map_path`; v2 returns `links` as objects `{url, …}`; `firecrawl-map.php` normalizes to URL strings.
- **Search:** `POST …/v2/search` — `search_path`; payload includes `location` and (for 2-letter codes) `country`.
- **Extract (optional fallback):** `POST …/v1/extract` only in `firecrawl-extract.php`. v2 extract is **async** (job id + polling); this repo keeps v1 for the rare extract fallback until a v2 job flow is implemented.
- **Auth:** `Authorization: Bearer {API_KEY}`
- **Scrape body (typical):** `{"url": "...", "formats": ["markdown"], "onlyMainContent": true}` (string format names are supported per API docs).
- **Success responses:** include `success: true` and `data` (e.g. `data.markdown` for scrape).

### Troubleshooting

| Symptom | Likely cause |
|--------|----------------|
| PHP fallback never runs | `FIRECRAWL_API_KEY` unset and no `firecrawl-api-key.php` → `enabled => false` in `firecrawl-config.php`. |
| Always `null` / no markdown | Check PHP `error_log` for lines prefixed `Firecrawl scrape` (HTTP code, `success: false`, or missing markdown). |
| HTTP **401** | Missing/invalid API key. |
| HTTP **402** | Plan / credits (see [billing](https://docs.firecrawl.dev/billing.md)). |
| HTTP **429** | Rate limit; retry with backoff. |
| JSON `success: false` | API error message in `error` / `message`; body may cite engine failures (see [error codes](https://docs.firecrawl.dev/api-reference/introduction)). |

## Credit Tracking

- Firecrawl credits: Scrape 1/URL; Extract ~1–32 (variable). Production uses Scrape-first.
- **Credit logging:** Default `credit_log_enabled => true` logs to `v2/data/firecrawl-credits-log.json`. Run `php v2/scripts/dev-helpers/firecrawl-credit-summary.php` to summarize.
- Monitor via [Firecrawl dashboard](https://firecrawl.dev/app/usage)
- **Plans:** Free 500 (one-time), Hobby 3k, Standard 100k, Growth 500k per month. See [Firecrawl billing](https://docs.firecrawl.dev/billing.md).

## Paid Plan Optimization

With a paid Firecrawl plan (Hobby 3k, Standard 100k, Growth 500k/month), you can use:

| Feature | Config / Flag | Purpose |
|---------|---------------|---------|
| **Proactive top N** | `proactive_enabled` + `firecrawl_proactive_top_n` (default: 3) | Always use Firecrawl for top N competitors; skips cURL |
| **Remediate top N** | `remediate_top_n` (default: 7) | Validate/remediate top N sparse competitors per post |
| **Firecrawl Search default** | `use_firecrawl_search_default` (default: false) | Use Firecrawl Search in pipeline by default |
| **maxAge** | `max_age_ms` (default: 172800000 = 2 days) | Cache scrapes; speeds repeat runs |
| **Batch Scrape** | `firecrawl-batch-remediate-sparse.php --input=sparse.json` | Bulk remediate sparse URLs from audit output |
| **Search** | `--use-firecrawl-search` on collect scripts | Supplement SISTRIX with Firecrawl Search (DE geo) |
| **Map** | `firecrawlMapDomain($url, $search, $limit)` | Discover competitor site structure for content gap analysis |

### Credit Costs per Endpoint

| Endpoint | Cost |
|----------|------|
| Scrape | 1 credit per URL |
| Extract | ~1–32 credits (token-based; avoid when possible) |
| Search | ~2 credits per 10 results (+ per-page scrape if formats: markdown) |
| Map | 1 credit per call |
| Batch Scrape | 1 credit per page |

### Remediation Workflow

1. **Validate:** Run `validate-*-competitor-data-completeness.php` (blog/tools/templates)
2. **Auto-fix:** Add `--remediate` to auto-fix sparse via Firecrawl API (Scrape-first, ~1 credit/URL)
3. **Batch:** For bulk remediation: `audit-firecrawl-sparse-competitors.php --output-urls=sparse.json` then `firecrawl-batch-remediate-sparse.php --input=sparse.json [--max-urls=N]`
4. **Pipeline:** Template improvement pipeline supports `--firecrawl-remediate` to pass `--remediate` to the validator

### Firecrawl Search vs Serper vs SISTRIX

| Tool | Use When |
|------|----------|
| **Serper** | Real Google SERP, PAA, featured snippets; actual rankings matter |
| **Firecrawl Search** | Web search + full page content for top results; DE geo; no separate scrape needed |
| **SISTRIX** | Keyword rankings for German market; competitor URLs for analysis |

Use Firecrawl Search when you need full page content for top results without separate scrape calls. Use Serper when you need real PAA/featured snippet data.

## MCP Usage

When creating blog, template, or tool content:

**Trigger:** competitor-analysis.json has sparse FAQs, empty headings, or word_count &lt; 100 for key competitors.

**Options:**
- Run `validate-*-competitor-data-completeness.php --remediate` (Scrape-first, 1 credit/URL)
- Or: "Use firecrawl_scrape with formats: ['markdown'] for [competitor URL]" — **avoid firecrawl_extract** (22–32 credits)

**Output:** AI-ready markdown. Document findings in SERP_ANALYSIS.md.

### Blog New Post Workflow

1. **Default (3k plan):** Proactive top 3 + remediate top 7. Pipeline runs `validate-blog-competitor-data-completeness.php --top=7 --remediate`. Use `--no-firecrawl-remediate` to skip.
2. **Firecrawl Search:** Set `use_firecrawl_search_default => true` in config, or pass `--use-firecrawl-search` per run. ~2–5 credits/post.
3. **Credit budget:** Proactive (3) + remediation (0–4) + optional Search (0–5) = 5–12 credits/post typical. See [FIRECRAWL_3K_PLAN_OPTIMIZATION.md](FIRECRAWL_3K_PLAN_OPTIMIZATION.md).

See:
- [SERP_MCP_IMPROVEMENT_GUIDE.md](../../content/blog/SERP_MCP_IMPROVEMENT_GUIDE.md)
- [TOOLS_SERP_MCP_GUIDE.md](../../guides/tools-pages/TOOLS_SERP_MCP_GUIDE.md)
- blog-content-creation-gate.mdc, templates-content-creation-gate.mdc

## Next Steps After Implementation

1. **3k plan:** Defaults in `firecrawl-config.php` enable proactive (top 3), remediate top 7, credit logging. Copy `firecrawl-api-key.php.example` to `firecrawl-api-key.php` and add API key. See [FIRECRAWL_3K_PLAN_OPTIMIZATION.md](FIRECRAWL_3K_PLAN_OPTIMIZATION.md).
2. **Pipeline defaults:** New post pipeline auto-remediates sparse competitors (top N from config). Use `--no-firecrawl-remediate` only when credits are tight.
3. **Batch remediate sparse (monthly):** Run `audit-firecrawl-sparse-competitors.php --output-urls=sparse.json` then `firecrawl-batch-remediate-sparse.php --input=sparse.json` to fix sparse data across blog, tools, templates.
4. **Firecrawl Search:** Set `use_firecrawl_search_default => true` in config for pillar posts, or pass `--use-firecrawl-search` per run.

## Future: Downloads

Download pages currently have no content blocks or competitor analysis. If downloads gain SEO content blocks (like templates), add:

- `collect-download-competitor-analysis.php` with Firecrawl fallback
- SERP workflow doc for downloads
- Firecrawl MCP step in download content creation gate

## References

- [FIRECRAWL_3K_PLAN_OPTIMIZATION.md](FIRECRAWL_3K_PLAN_OPTIMIZATION.md) – 3k plan defaults, batch remediation, monitoring
- [Firecrawl API docs](https://docs.firecrawl.dev/api-reference/introduction)
- [MCP_INTEGRATION.md](../../development/MCP_INTEGRATION.md)
- [mcp-usage.mdc](../../../.cursor/rules/mcp-usage.mdc)
