# blog-data-collection Full Instructions

## Overview

Rules for collecting and managing data from SISTRIX, GA4, and GSC APIs for blog post documentation and SEO analysis.

**Cross-surface map (blog + templates + tools):** [CONTENT_CREATION_DATA_CHECKLIST.md](../../docs/content/CONTENT_CREATION_DATA_CHECKLIST.md) lists per-post orchestrators (`run-new-post-pipeline.php`, `run-post-improvement-pipeline.php`), bulk `run-all-data-collection.php`, and expected artifacts before outline work.

## Keyword selection order (new vs live posts)

1. **Primary** — Slug/title-aligned, German spelling per [PRIMARY_KEYWORD_MANAGEMENT_GUIDE.md](../../docs/content/blog/PRIMARY_KEYWORD_MANAGEMENT_GUIDE.md) / [SISTRIX_ENDPOINTS_AND_REPORTS.md](../../docs/content/blog/SISTRIX_ENDPOINTS_AND_REPORTS.md).
2. **SISTRIX first pass** — `collect-post-keywords-sistrix.php --primary-only` → review `related_keywords` in `keywords-sistrix.json` before locking secondaries.
3. **Merge** — Secondaries from ideas + metrics, PAA/competitor signals; optional `propose-secondary-keywords.php`; document in `KEYWORD_DECISION.md`.
4. **Finalize** — `target-keywords.json` (≤7 terms), then full `collect-post-keywords-sistrix.php` without `--primary-only`.
5. **Live URLs** — Prefer `derive-target-keywords.php` after GSC; if no GSC yet, `--from-sistrix` bridge. See [KEYWORD_RESEARCH_WORKFLOW.md](../../docs/content/blog/KEYWORD_RESEARCH_WORKFLOW.md).

## When to Run Data Collection

### Initial Collection

- When setting up documentation for new blog posts
- After migrating blog content to new structure
- When API credentials are updated

### Regular Updates

- **SISTRIX:** Monthly (competition/metrics are relatively stable)
- **GA4:** Weekly (traffic data changes frequently)
- **GSC:** Weekly (search performance changes frequently)

### Before Manual Review

- Always run data collection before manual review
- Ensures documentation has latest performance data
- Provides data-driven insights for improvement plans

### PAA Collection

- Run `collect-post-paa-questions.php` for new posts (pipeline does this) and before FAQ generation when `faq-research.paa_questions` is empty
- serp-features API returns PAA count only; keyword.questions returns actual question text → paa-questions.json merged into faq-research
- **Retry:** Script retries 3× (2s, 4s, 8s) for HTTP 0, 429, 5xx. HTTP 0 is known transient; re-run may succeed.
- **Troubleshooting:** Use `--debug` when 0 questions returned. Primary keyword fallback: post JSON → keywords-sistrix → target-keywords.json `primary`. On failure: `v2/data/blog/paa-last-error.json` for debugging.
- **Topic override:** When SISTRIX returns off-topic PAA (e.g. Gen Z for Generation X), create `data/paa-questions-manual.json` with SERP_ANALYSIS curated list. See [PAA_TOPIC_OVERRIDE_GUIDE.md](docs/content/blog/PAA_TOPIC_OVERRIDE_GUIDE.md). Source hierarchy: paa-questions-manual.json > paa-questions.json.

## API Usage Best Practices

### SISTRIX API

**Credit Management:**

- **Weekly limit: 10,000 credits** (resets Monday) - Primary constraint
- Daily limit: 2000 credits (secondary constraint, relaxed if weekly credits available)
- Monitor usage: Check `v2/data/blog/sistrix-credits-log.json`
- Use caching: 7-day cache reduces duplicate calls
- **Cross-post keyword batching: Use `collect-all-keywords-cross-post.php`** - Extracts all unique keywords, processes in largest batches (up to 50 keywords), distributes back to posts (~90% reduction in API calls)
- **Optimal batch size: 30 keywords per batch** (tested and verified optimal)
- **POST requests: Automatically used for batches > 20 keywords** (avoids URL length limits)
- **Parallel processing: Use `collect-post-paa-questions-parallel.php`** for PAA collection (5-10 concurrent requests, ~5x faster)
- **No rate limiting delays for batch endpoints** (single API call per batch)
- **Adaptive delays: 0.5s for individual endpoints** (reduced from 1s)
- **Exponential backoff: Automatic retry for 429 errors** (2s, 4s, 8s delays)
- **Cache pre-checking: Use `check-sistrix-cache-status.php`** before collection
- **Credit pre-checking: Validates credits before starting** large collections
- **Resume capability: Checkpoints save progress** for interrupted collections
- **Domain-level data: Collect once, reuse for all posts** (~252 credits one-time)

**Endpoints Used:**

**Keyword Endpoints:**

- `keyword.seo.metrics`: 5 credits per keyword (volume, difficulty, competition, historical trends) - **Supports batch mode (up to 50 keywords, optimal: 30 keywords per batch)**
- `keyword.seo.serpfeatures`: 1 credit per keyword (SERP features: featured snippets, knowledge panels, PAA)
- `keyword.seo.searchintent`: 1 credit per keyword (search intent classification)
- `keyword.seo.competition`: 1 credit per keyword (competition levels) - **Supports batch mode**
- `keyword.questions`: 1 credit per question (People Also Ask questions - actual questions, not just counts) - **Use parallel processing (`collect-post-paa-questions-parallel.php`) for efficiency**
- `keyword.seo`: 1 credit per keyword (ranking positions for top 10 results). **Tier limits** (config: `v2/config/sistrix-collection-limits.php`): Tier 1 = 8 keywords, Tier 2 = 5 (competitor-analysis); top 15 URLs per keyword.
- `marketplace.keyword.search.ideas`: 1 credit per idea (related keywords, semantic variations, keyword discovery). **Tier limits** (config: `v2/config/sistrix-collection-limits.php`): Tier 1 = 25 ideas, Tier 2 = 8. **Mode:** `include` (default, broad semantic), `same` (all words any order), `exact` (exact match). Use `--mode=same` for stricter long-tail.
- `keyword.domain.seo` (with `kw` parameter): 100 credits per keyword (SERP top 10 - expensive, optional)
- `keyword.domain.seo` (with `domain` parameter): 1 credit per keyword (domain keywords)

**Domain Endpoints (Collect Once, Reuse):**

- `domain.opportunities`: 1 credit per opportunity (keyword opportunities)
- `domain.competitors.seo`: 1 credit per competitor (SEO competitors)
- `domain.ranking.distribution`: 1 credit (position distribution)
- `domain.traffic.estimation`: 1 credit (traffic estimates)
- `domain.ideas`: 1 credit per idea (content ideas)
- `domain.kwcount.seo.top10`: 1 credit (top 10 keyword count)

**Link Endpoints:**

- `links.overview`: 1 credit (backlink overview)
- `links.linktargets`: 1 credit per target (link targets)
- `links.linktexts`: 1 credit per text (anchor texts)

**Configurable Limits (All Posts) (2026-02-16):**

- Config file: `v2/config/sistrix-collection-limits.php`
- **related_keywords_limit:** 25 (marketplace.keyword.search.ideas) – all posts
- **competitor_keywords_count:** 8 (primary + 7 secondary) – all posts
- **competitor_analysis_top_count:** 15 (top competitor URLs per keyword)
- **keyword_questions_limit:** 0 = disabled; set to 25 to enable PAA via keyword.questions
- **secondary_paa_keywords_limit:** 4 (secondary keywords for PAA collection)
- All posts receive full data. Tier (FAQ_REBUILD_PRIORITY_LIST) is used for prioritization only.
- See [SISTRIX_COLLECTION_IMPROVEMENTS_2026-02.md](docs/content/blog/SISTRIX_COLLECTION_IMPROVEMENTS_2026-02.md) and [SISTRIX_USAGE_AUDIT_REPORT.md](docs/content/blog/SISTRIX_USAGE_AUDIT_REPORT.md) for credit estimates

**Collection cadence:** All posts get full collection. Run weekly when credits allow; stay within 10,000/week.

**New Endpoints (2026-01-15):**

- `keyword.questions`: Extract actual PAA questions (not just counts) - Use `collect-post-paa-questions.php`; set `keyword_questions_limit` in config to enable
- `marketplace.keyword.search.ideas`: Discover related keywords - Integrated into `collect-post-keywords-sistrix.php`, unified limit (15 for all posts)
- `keyword.seo`: Get competitor URLs ranking for keywords - Use `collect-post-competitor-analysis.php`, unified limit (5 for all posts)
- Historical trends: Include `history=true` parameter in `keyword.seo.metrics` to get trend data (avoid unless needed - see "Do NOT Add" in SISTRIX_ENDPOINTS)

**Best Practices:**

- **Keyword spelling:** Keywords for SISTRIX/Serper must use proper German spelling (ü, ä, ö, ß). ASCII expansion (ue, ae, oe, ss) returns incorrect/low volume data. Scripts apply `getSearchKeywordForApi()` when needed.
- **Always use cross-post keyword batching** (`collect-all-keywords-cross-post.php`) for maximum efficiency
- **Use optimal batch size: 30 keywords per batch** (tested and verified)
- **Use parallel processing** for non-batch endpoints (PAA, rankings) - 5-10 concurrent requests
- **Pre-check cache status** before collection (`check-sistrix-cache-status.php`)
- **Pre-check credits** before starting large collections (validates against limits)
- **Use resume capability** for long-running collections (checkpoints save progress)
- **Skip history parameter** unless trend analysis needed (default: false, saves time)
- Always check credit usage before large collections (weekly limit is primary)
- Use `--dry-run` to estimate credit usage
- **Collect domain-level data once** (~252 credits), reference in all posts
- **Skip expensive SERP collection** (100 credits/keyword), use GSC data instead
- **Collect SERP features** (1 credit/keyword) for top keywords to identify opportunities
- **Collect search intent** (1 credit/keyword) for all primary keywords
- **Collect competition levels** (1 credit/keyword, batch mode) for all keywords
- **Collect PAA questions** (1 credit/question) using `keyword.questions` endpoint with parallel processing
- **Collect related keywords** using `marketplace.keyword.search.ideas` for semantic variations; use `--mode=include` (default) for broad coverage, `--mode=same` for stricter long-tail
- **Gap-fill:** New posts get competition-levels and search-intent from pipeline. When older posts are missing competition-levels, competitor-analysis, or search-intent: run `audit-keyword-data-completeness.php`, then `run-sistrix-gap-fill.php --data-type=TYPE`. See [SISTRIX_USAGE_AUDIT_REPORT.md](docs/content/blog/SISTRIX_USAGE_AUDIT_REPORT.md) Gap-fill Runbook
- **Collect competitor analysis** using `keyword.seo` with parallel processing for rankings
- **Include historical trends** only when needed (`--with-history` flag, default: false)
- Leverage caching to minimize API calls (30-day TTL for keywords/PAA, 7 days for rankings)
- Monitor weekly credits (resets Monday)

### Google APIs (GA4 & GSC)

**Rate Limiting:**

- 1 second delay between requests (built into scripts)
- Google APIs have their own rate limits
- Scripts handle rate limit errors gracefully

**GSC Site URL Format (CRITICAL):**

- ✅ **Correct:** `https://www.ordio.com/` (URL prefix property - automatically detected)
- ❌ **Wrong:** `sc_domain:ordio.com` (domain property - not configured)
- Script automatically detects correct site property from available properties
- URL format must match exactly: `https://www.ordio.com` + post URL (with trailing slash)
- Error logs saved to: `v2/data/blog/gsc-collection-errors.log`

**GA4 Date Range Mapping:**

- GA4 returns **one row per date range** (not one row with multiple ranges)
- Row 0 = first date range (last_90_days)
- Row 1 = second date range (last_year)
- Each row contains all metrics for that specific date range
- Error logs saved to: `v2/data/blog/ga4-collection-errors.log`

**Best Practices:**

- Run collections during off-peak hours if possible
- Monitor for rate limit errors in logs
- Use batch processing for large collections
- **Always verify site URL format** before GSC collection
- **Test date range mapping** if GA4 data seems incorrect

## Data Collection Workflow

### Step 1: Verify API Access

```bash
php v2/scripts/blog/test-api-access.php --all
```

**Expected Output:**

- ✅ All APIs show "SUCCESS"
- No authentication errors
- Test queries execute successfully

### Step 2: Run Data Collection

**Option A: Optimized Batch Script (Recommended)**

```bash
# Check cache status first
php v2/scripts/blog/check-sistrix-cache-status.php

# Run optimized collection (cross-post batching + parallel processing)
php v2/scripts/blog/run-sistrix-collection-batch.php \
  --use-cross-post \
  --concurrent=5 \
  --max-keyword-batch=30 \
  --checkpoint-interval=10
```

**Option B: Master Script**

```bash
php v2/scripts/blog/run-all-data-collection.php --all
```

**Option C: Individual Scripts**

```bash
# SISTRIX (cross-post batching - most efficient)
php v2/scripts/blog/collect-all-keywords-cross-post.php --max-batch-size=30

# PAA (parallel processing)
php v2/scripts/blog/collect-post-paa-questions-parallel.php --all --concurrent=5

# GA4
php v2/scripts/blog/collect-post-performance-ga4.php --all

# GSC
php v2/scripts/blog/collect-post-performance-gsc.php --all
```

### Step 3: Validate Data

```bash
php v2/scripts/blog/validate-data-collection.php --all
```

**Expected Output:**

- ✅ All data files exist and are valid
- No missing files
- No invalid JSON

### Step 4: Regenerate Documentation

**Safe Regeneration (Preserves Manual Edits):**

```bash
php v2/scripts/blog/safe-regenerate-documentation.php --all
```

**Direct Regeneration (Overwrites Manual Edits - Use with Caution):**

```bash
php v2/scripts/blog/generate-post-documentation.php --all
```

**Expected Output:**

- All documentation files regenerated
- Real API data used instead of placeholders
- No "N/A" or "{VOLUME}" placeholders in SEO reports
- Manual sections preserved (if using safe regeneration)

**Important:** Always use `safe-regenerate-documentation.php` to preserve manual edits. The script automatically extracts manual sections before regeneration and restores them afterward.

## JSON Encoding for Data Files

**Required:** Scripts that write JSON data files (keywords-sistrix.json, faq-research.json, etc.) must use:
`JSON_PRETTY_PRINT | JSON_UNESCAPED_UNICODE | JSON_UNESCAPED_SLASHES` so that `post_url` and German text (ü, ä, „) are stored correctly, not escaped.

**Remediation:** If files were corrupted (e.g. by a script that wrote without these flags), run:
`php v2/scripts/blog/remediate-json-encoding.php --backup`

**See:** [docs/development/JSON_ENCODING_STANDARDS.md](../../docs/development/JSON_ENCODING_STANDARDS.md)

## Data File Locations

All data files are stored in:

```
docs/content/blog/posts/{category}/{slug}/data/
├── keywords-sistrix.json      # SISTRIX keyword data
├── performance-ga4.json       # GA4 performance metrics
├── performance-gsc.json        # GSC search performance
├── content-analysis.json       # Content analysis
├── seo-analysis.json           # SEO analysis
├── links-analysis.json         # Internal links analysis
└── related-resources.json      # Related resources mapping
```

## Troubleshooting

### SISTRIX Credit Limit Reached

**Symptoms:** Script stops with "Daily credit limit reached" errors

**Solutions:**

1. Check credit usage: `cat v2/data/blog/sistrix-credits-log.json`
2. Wait until next day (credits reset at midnight)
3. Reduce batch size: Use `--limit=20` instead of `--all`
4. Skip domain position queries (already implemented to save credits)

### Missing Data Files

**Symptoms:** Validation shows missing files

**Solutions:**

1. Run collection script for missing posts
2. Check for API errors in script output
3. Verify post URLs are correct in post JSON files
4. Check file permissions on data directories

### Invalid JSON Files

**Symptoms:** Validation shows invalid JSON errors

**Solutions:**

1. Check JSON syntax: `php -r "json_decode(file_get_contents('file.json'));"`
2. Re-run collection script for affected posts
3. Check for API response format changes

### API Authentication Errors

**Symptoms:** "Failed to initialize Google API client" or "SISTRIX API key not configured"

**Solutions:**

1. Verify credentials files exist:
   - `v2/config/google-api-credentials.json`
   - `docs/seo-strategy-2026/config.json` (for SISTRIX)
2. Test API access: `php v2/scripts/blog/test-api-access.php --all`
3. Check service account permissions in Google Cloud Console
4. Verify SISTRIX collection is enabled: `docs/seo-strategy-2026/config/sistrix-disabled.json`

### GSC Data Shows Zeros

**Symptoms:** All posts show 0 clicks, 0 impressions despite known traffic

**Root Causes:**

1. Wrong site URL format (using `sc_domain:` instead of `https://www.ordio.com/`)
2. URL format mismatch (missing trailing slash, wrong protocol)
3. Silent API errors not being logged

**Solutions:**

1. **Run diagnostic script:**

   ```bash
   php v2/scripts/blog/test-gsc-debug.php --post={slug} --category={category} --verbose
   ```

2. **Check error logs:**

   ```bash
   cat v2/data/blog/gsc-collection-errors.log
   ```

3. **Re-run collection:**
   ```bash
   php v2/scripts/blog/collect-post-performance-gsc.php --all
   ```

**Prevention:**

- Collection script now automatically detects correct site property
- Error logging enabled by default
- URL format matching includes fallback strategies

### GA4 Date Range Data Missing

**Symptoms:** Only `last_90_days` data present, `last_year` shows zeros

**Root Causes:**

1. Hardcoded date range index (only processing first range)
2. Incorrect row mapping (GA4 returns one row per range)

**Solutions:**

1. **Run diagnostic script:**

   ```bash
   php v2/scripts/blog/test-ga4-debug.php --post={slug} --category={category} --verbose
   ```

2. **Re-run collection:**
   ```bash
   php v2/scripts/blog/collect-post-performance-ga4.php --all
   ```

**Prevention:**

- Date range mapping now correctly handles multiple ranges
- Each row mapped to corresponding date range index

## Integration with Documentation Generation

Data collection scripts automatically integrate with documentation generation:

1. **Data Collection:** Scripts save data to `{post}/data/` directories
2. **Documentation Generation:** `generate-post-documentation.php` loads data files
3. **Template Replacement:** Real data replaces placeholders in templates
4. **SEO Reports:** Show actual metrics instead of "N/A" or "{VOLUME}"

### Preservation System

**Dual-Section Structure:**

- **Automated Sections:** Marked with `<!-- BEGIN AUTOMATED -->` and `<!-- END AUTOMATED -->`
- **Manual Sections:** Marked with `<!-- BEGIN MANUAL -->` and `<!-- END MANUAL -->`

**Safe Regeneration:**

- `safe-regenerate-documentation.php` extracts manual sections before regeneration
- Regenerates automated sections with latest data
- Restores manual sections after regeneration
- Ensures human insights are never lost

**Manual Notes File:**

- `docs/content/blog/posts/{category}/{slug}/MANUAL_NOTES.md` is never overwritten by scripts (per-post file)
- Use for comprehensive manual review notes
- Track implementation progress
- Document expert insights

## Monitoring and Maintenance

### Regular Checks

**Weekly:**

- Check data freshness: `php v2/scripts/blog/validate-data-collection.php --all --stale-days=7`
- Review GA4/GSC data for significant changes
- Check for API errors in logs

**Monthly:**

- Run full SISTRIX collection (if credits available)
- Review credit usage patterns
- Update documentation with latest data

### Data Quality Indicators

**Good Data Quality:**

- All data files exist (99/99 posts)
- All JSON files are valid
- Data is fresh (< 30 days for SISTRIX, < 7 days for GA4/GSC)
- No API errors in collection logs

**Poor Data Quality:**

- Missing data files
- Invalid JSON files
- Stale data (> threshold)
- High error rate in collection

## Script Maintenance

### When Updating Scripts

1. **Test with Sample Posts First:**

   ```bash
   php v2/scripts/blog/collect-post-keywords-sistrix.php --all --limit=3
   ```

2. **Verify Data Format:**

   - Check generated JSON files
   - Verify data structure matches expected format
   - Test documentation generation

3. **Update Documentation:**
   - Update this rule file if workflow changes
   - Update `docs/content/blog/guides/DATA_COLLECTION_GUIDE.md` if API usage changes
   - Document any new endpoints or parameters

### Version Compatibility

- **PHP:** Requires PHP 8.0+ (for null coalescing operator, etc.)
- **Google API Client:** Requires `google/apiclient` via Composer
- **SISTRIX API:** Uses REST API (no special dependencies)

## Advanced Collection Scripts

**New Scripts (2026-01-15):**

1. `collect-post-paa-questions.php` - PAA questions extraction (1 credit/question) - **NEW**
2. `collect-post-competitor-analysis.php` - Competitor content analysis (1 credit/keyword + content scraping) - **NEW**
3. `generate-content-brief-from-sistrix.php` - Comprehensive content brief generation - **NEW**
4. `content-writing-assistant.php` - Master workflow script combining all tools - **NEW**
5. `analyze-topical-authority.php` - Keyword clustering and content gap analysis - **NEW**

**Existing Scripts:**

1. `collect-post-serp-features.php` - SERP features (1 credit/keyword)
2. `collect-post-search-intent.php` - Search intent (1 credit/keyword)
3. `collect-post-competition-levels.php` - Competition levels (1 credit/keyword, batch)
4. `collect-competitor-keywords.php` - Competitor keywords (1 credit/keyword)
5. `collect-domain-content-ideas.php` - Content ideas (1 credit/idea)
6. `collect-domain-opportunities.php` - Domain opportunities (1 credit/opportunity)
7. `collect-domain-backlinks.php` - Backlink analysis (1 credit + 1 per target/text)
8. `collect-high-value-serp-data.php` - High-value SERP (100 credits/keyword, selective)

**Master Script:**

```bash
php v2/scripts/blog/run-all-advanced-collection.php [--dry-run] [--skip-phase=N]
```

**Total Estimated Cost:** ~2,550 credits (all phases)

## Advanced Data Integration

### New Data Sources (2026-01-15)

The following advanced data sources are now integrated into documentation:

1. **PAA Questions Data** (`paa-questions.json`) - **NEW**

   - Actual PAA questions (not just counts)
   - Search volume per question
   - Priority scores for FAQ generation
   - Used for specific FAQ recommendations

2. **Competitor Analysis Data** (`competitor-analysis.json`) - **NEW**

   - Competitor URLs ranking for keywords
   - Content structure analysis (headings, word count, FAQs)
   - Competitor FAQs and headings
   - Used for content structure recommendations and gap analysis

3. **Related Keywords Data** (in `keywords-sistrix.json`) - **NEW**

   - Semantic keyword variations
   - Related keywords from keyword discovery
   - Traffic and competition data for related terms
   - Used for keyword clustering and internal linking

4. **Historical Trends Data** (in `keywords-sistrix.json`) - **NEW**

   - Search volume trends over time
   - Competition changes
   - Trend direction (rising, declining, stable)
   - Used for seasonal pattern identification

5. **Search Intent Data** (`search-intent.json`)

   - Primary intent classification (informational, navigational, transactional)
   - Intent distribution percentages
   - Used for content strategy recommendations

6. **SERP Features Data** (`serp-features.json`)

   - Featured snippets presence
   - Knowledge panels
   - People Also Ask (PAA) boxes
   - Used for SERP feature optimization

7. **Competition Levels Data** (`competition-levels.json`)

   - Keyword competition levels (low, medium, high)
   - Quick-win keyword identification
   - Used for keyword prioritization

8. **Domain Opportunities Data** (`domain-opportunities.json`)
   - Domain-level keyword opportunities
   - Current positions
   - Potential gains
   - Used for cross-post optimization

### Data-Driven Recommendations

These data sources enable:

- **Search Intent-Driven Content Strategy:** Tailor content to user intent
- **SERP Feature Optimization:** Target featured snippets, PAA boxes
- **Competition-Based Prioritization:** Focus on quick-win keywords
- **Domain Opportunity Cross-Referencing:** Identify posts close to ranking

See `docs/content/blog/guides/DATA_INTEGRATION_GUIDE.md` for detailed usage.

## References

- **Data Collection Guide:** `docs/content/blog/guides/DATA_COLLECTION_GUIDE.md`
- **Data Integration Guide:** `docs/content/blog/guides/DATA_INTEGRATION_GUIDE.md`
- **Manual Review Workflow:** `docs/content/blog/guides/MANUAL_REVIEW_WORKFLOW.md`
- **API Test Script:** `v2/scripts/blog/test-api-access.php`
- **Master Collection Script:** `v2/scripts/blog/run-all-data-collection.php`
- **Advanced Collection Script:** `v2/scripts/blog/run-all-advanced-collection.php`
- **Safe Regeneration Script:** `v2/scripts/blog/safe-regenerate-documentation.php`
- **Validation Script:** `v2/scripts/blog/validate-data-collection.php`
- **Quality Validation Script:** `v2/scripts/blog/validate-documentation-quality.php`
- **API Data Quality Validation:** `v2/scripts/blog/validate-api-data-quality.php`
- **GSC Debug Script:** `v2/scripts/blog/test-gsc-debug.php`
- **GA4 Debug Script:** `v2/scripts/blog/test-ga4-debug.php`
- **Troubleshooting Guide:** `docs/content/blog/guides/TROUBLESHOOTING_DATA_COLLECTION.md`
- **Data Quality Dashboard:** `docs/content/blog/DATA_QUALITY_DASHBOARD.md`
- **SISTRIX Best Practices:** `docs/seo-strategy-2026/research/SISTRIX_API_BEST_PRACTICES.md`