# Data Collection Scripts Inventory

**Last Updated:** 2026-02-12

Complete inventory of all data collection scripts available for blog post improvement, including usage examples, best practices, and credit management. For SISTRIX endpoint → script → report → decision mapping, see **[SISTRIX_ENDPOINTS_AND_REPORTS.md](SISTRIX_ENDPOINTS_AND_REPORTS.md)**.

## Overview

This inventory documents all data collection scripts used in the blog post improvement process. Each script collects specific data from GA4, GSC, or SISTRIX APIs to inform content strategy and optimization.

**SISTRIX limits:** All posts receive full data (15 related keywords, 5 competitor keywords). Tier (FAQ_REBUILD_PRIORITY_LIST) is used for prioritization only, not data limits.

**Canonical keyword-based order (improvement pipeline):** GA4+GSC (parallel) → derive-target-keywords → keywords-sistrix → **PAA** → FAQ research → SERP features → competitor analysis. **PAA must run before FAQ research** so faq-research.json gets actual question text (serp-features returns count only).

## New Post Creation Pipeline

For **new posts** (no existing post to improve), use `run-new-post-pipeline.php` instead of GA4/GSC scripts. New posts have no historical data, so the pipeline skips GA4/GSC and uses only SISTRIX + target-keywords.json.

### run-new-post-pipeline.php

**Purpose:** Orchestrate data collection for new blog posts (no GA4/GSC)

**Location:** `v2/scripts/blog/run-new-post-pipeline.php`

**Usage:**

```bash
# Run full pipeline for new post (default: no flags)
php v2/scripts/blog/run-new-post-pipeline.php --post=slug --category=lexikon

# Skip PAA only when SISTRIX credits low
php v2/scripts/blog/run-new-post-pipeline.php --post=slug --category=lexikon --skip-paa

# Continue if PAA has already failed (recovery only; add manual paa-questions.json later)
php v2/scripts/blog/run-new-post-pipeline.php --post=slug --category=lexikon --allow-paa-failure
```

**Options:** `--skip-paa` (skip PAA step only; use when SISTRIX credits low), `--allow-paa-failure` (recovery only: continue when PAA has failed; do NOT use preemptively – it masks failures).

**Requirements:**

- Post scaffold created via `create-new-blog-post.php`
- `docs/content/blog/posts/{category}/{slug}/data/target-keywords.json` with `primary_keyword`

**Steps (in order):**

1. `collect-post-keywords-sistrix.php` (--keywords override from target-keywords)
2. `collect-post-paa-questions.php` (keyword.questions; outputs paa-questions.json)
3. `collect-faq-research-data.php` (--keywords override; merges PAA from paa-questions.json when serp-features has no question text)
4. `collect-post-serp-features.php`
5. `collect-post-competitor-analysis.php`
6. `analyze-competitor-content-depth.php`
7. `generate-serp-analysis-skeleton.php`
8. `content-depth-report.php`
9. `generate-pre-content-checklist.php`

**PAA merge logic:** serp-features (keyword.seo.serpfeatures) returns PAA count only, not question text. `collect-faq-research-data.php` tries serp-features first, then falls back to paa-questions.json (from keyword.questions). Run PAA collection before FAQ research so merge works on first run.

**PAA source hierarchy:** paa-questions-manual.json (if exists; override for off-topic SISTRIX results) → serp-features people_also_ask → paa-questions.json → SISTRIX. Used by collect-faq-research-data.php, validate-content-completeness.php, analyze-competitor-content-depth.php. See [PAA_TOPIC_OVERRIDE_GUIDE.md](PAA_TOPIC_OVERRIDE_GUIDE.md).

**PAA retry and troubleshooting:** `collect-post-paa-questions.php` retries 3× (2s, 4s, 8s backoff) for HTTP 0, 429, 5xx. Primary keyword fallback: post JSON → keywords-sistrix → target-keywords.json `primary`. On failure, writes `v2/data/blog/paa-last-error.json` for debugging. Use `--debug` when 0 questions returned.

**Data Collected:** Same as improvement pipeline except GA4/GSC (not available for new URLs)

**Credit Usage:** ~15–25 SISTRIX credits per new post

**Reference:** [BLOG_POST_IMPROVEMENT_PROCESS.md](BLOG_POST_IMPROVEMENT_PROCESS.md) "New Post Creation"

## Script Categories

### 1. Performance Data Collection (GA4/GSC)

### 2. Keyword Data Collection (SISTRIX)

### 3. SERP Data Collection (SISTRIX)

### 4. Advanced Data Collection (SISTRIX)

### 5. Master Collection Scripts

## Performance Data Collection Scripts

### collect-post-performance-ga4.php

**Purpose:** Collect traffic and engagement metrics from Google Analytics 4

**Location:** `v2/scripts/blog/collect-post-performance-ga4.php`

**Usage:**

```bash
# Single post
php v2/scripts/blog/collect-post-performance-ga4.php --post=slug --category=category

# All posts
php v2/scripts/blog/collect-post-performance-ga4.php --all

# With limit
php v2/scripts/blog/collect-post-performance-ga4.php --all --limit=20
```

**Data Collected:**

- Page views (last 90 days, last year)
- Sessions (last 90 days, last year)
- Bounce rate (last 90 days, last year)
- Average engagement time (last 90 days, last year)

**Output:** `docs/content/blog/posts/{category}/{slug}/data/performance-ga4.json`

**Rate Limiting:** 1 second delay between requests

**Dependencies:**

- Google API credentials: `v2/config/google-api-credentials.json`
- GA4 Property ID: `275821028`
- Composer dependencies (Google API client)

**Best Practices:**

- Collect weekly for active posts
- Use for identifying underperforming content
- Compare metrics to identify improvement opportunities

### collect-post-performance-gsc.php

**Purpose:** Collect search performance data from Google Search Console

**Location:** `v2/scripts/blog/collect-post-performance-gsc.php`

**Usage:**

```bash
# Single post
php v2/scripts/blog/collect-post-performance-gsc.php --post=slug --category=category

# All posts
php v2/scripts/blog/collect-post-performance-gsc.php --all

# With limit
php v2/scripts/blog/collect-post-performance-gsc.php --all --limit=20
```

**Data Collected:**

- Clicks (last 90 days, last year)
- Impressions (last 90 days, last year)
- CTR (last 90 days, last year)
- Average position (last 90 days, last year)
- Top queries (last 90 days) - up to 25 queries

**Output:** `docs/content/blog/posts/{category}/{slug}/data/performance-gsc.json`

**Rate Limiting:** 1 second delay between requests

**Dependencies:**

- Google API credentials: `v2/config/google-api-credentials.json`
- GSC Site URL: `https://www.ordio.com/`
- Composer dependencies (Google API client)

**Best Practices:**

- Collect weekly for active posts
- Use top queries for FAQ generation
- Identify high-impression, low-click queries (optimization opportunities)
- Monitor position trends

## Keyword Data Collection Scripts

**Script roles:**

- **collect-post-keywords-sistrix.php:** Per-post collection (metrics + related keywords). Use for single post or new posts.
- **collect-related-keywords.php:** Gap-fill for posts with empty `related_keywords`. Use when keywords-sistrix exists but related_keywords is empty.
- **collect-all-keywords-cross-post.php:** Batch-efficient collection across all posts. Use for bulk refresh (batches keywords by unique primary, unified limit for all posts).

### collect-post-keywords-sistrix.php

**Purpose:** Collect keyword metrics, search volume, and competition data from SISTRIX

**Location:** `v2/scripts/blog/collect-post-keywords-sistrix.php`

**Usage:**

```bash
# Single post
php v2/scripts/blog/collect-post-keywords-sistrix.php --post=slug --category=category

# Category
php v2/scripts/blog/collect-post-keywords-sistrix.php --category=ratgeber

# All posts (with limit)
php v2/scripts/blog/collect-post-keywords-sistrix.php --all --limit=20

# Dry run (no API calls)
php v2/scripts/blog/collect-post-keywords-sistrix.php --all --dry-run

# Keyword expansion mode (marketplace.keyword.search.ideas)
php v2/scripts/blog/collect-post-keywords-sistrix.php --post=slug --category=cat --mode=same
```

**Keyword Expansion (--mode):** `include` (default, broad semantic), `same` (all words any order), `exact` (exact match). See [SISTRIX_ENDPOINTS_AND_REPORTS.md](SISTRIX_ENDPOINTS_AND_REPORTS.md).

**Data Collected:**

- Keyword search volume
- Keyword difficulty/competition
- Estimated clicks
- Desktop/mobile distribution
- CPC (if available)

**Output:** `docs/content/blog/posts/{category}/{slug}/data/keywords-sistrix.json`

**Credit Usage:** ~5 credits per keyword (keyword.seo.metrics endpoint)

**Batch Processing:**

- Processes keywords in batches of 10 per API call
- Same credit cost (5 credits per keyword)
- Significantly faster (1 API call vs 10 individual calls)
- Reduces API overhead by 90%

**Dependencies:**

- SISTRIX API key: `docs/seo-strategy-2026/config.json`
- PHP 8.0+
- cURL extension

**Best Practices:**

- Use caching to minimize API calls (7-day cache)
- Monitor daily credit usage
- Process in batches to stay within limits
- Extract keywords from slug, title, and meta keywords

### collect-post-competition-levels.php

**Purpose:** Collect competition levels for keywords to prioritize optimization efforts

**Location:** `v2/scripts/blog/collect-post-competition-levels.php`

**Usage:**

```bash
# Single post
php v2/scripts/blog/collect-post-competition-levels.php --post=slug --category=category

# All posts
php v2/scripts/blog/collect-post-competition-levels.php --all

# With limit
php v2/scripts/blog/collect-post-competition-levels.php --all --limit=100
```

**Data Collected:**

- Competition level for each keyword
- Quick-win opportunities (low competition)

**Output:** Updates `keywords-sistrix.json` with `competition_level`; creates `competition-levels.json` (for aggregate-prioritization-data)

**Credit Usage:** 1 credit per keyword (batch mode)

**Best Practices:**

- Collect for all keywords
- Prioritize low-competition keywords (< 30)
- Use for quick-win identification

## SERP Data Collection Scripts

### collect-post-serp-features.php

**Purpose:** Collect SERP feature data (featured snippets, PAA, knowledge panels) for optimization opportunities

**Location:** `v2/scripts/blog/collect-post-serp-features.php`

**Usage:**

```bash
# Single post
php v2/scripts/blog/collect-post-serp-features.php --post=slug --category=category

# All posts (with limit)
php v2/scripts/blog/collect-post-serp-features.php --all --limit=50

# Dry run
php v2/scripts/blog/collect-post-serp-features.php --all --dry-run
```

**Data Collected:**

- Featured snippet opportunities
- Knowledge panel eligibility
- People Also Ask (PAA) questions
- Related searches
- SERP feature competition

**Output:** `docs/content/blog/posts/{category}/{slug}/data/serp-features.json`

**Credit Usage:** 1 credit per keyword

**Best Practices:**

- Collect for top 50 keywords (volume > 500, position < 10)
- Use PAA questions for FAQ generation
- Identify featured snippet opportunities
- Optimize for AEO/GEO

### collect-post-serp-data.php

**Purpose:** Collect top 10 SERP results for keywords (expensive, use selectively)

**Location:** `v2/scripts/blog/collect-post-serp-data.php`

**Usage:**

```bash
# Collect SERP for top 20 primary keywords
php v2/scripts/blog/collect-post-serp-data.php --limit=20

# Collect for specific post
php v2/scripts/blog/collect-post-serp-data.php --post=slug --category=category

# Dry run
php v2/scripts/blog/collect-post-serp-data.php --limit=20 --dry-run
```

**Data Collected:**

- Top 10 ranking domains per keyword
- Domain URLs and titles
- Ranking positions

**Output:** `docs/content/blog/posts/{category}/{slug}/data/serp-results.json`

**Credit Usage:** 100 credits per keyword (expensive!)

**Strategy:**

- **Recommended:** Skip SERP collection, use GSC data instead (free, comprehensive)
- **Alternative:** Collect for top 20 high-value keywords only (2,000 credits)
- **Use Case:** Manual competitive analysis for specific keywords

**Best Practices:**

- Use only for high-value keywords
- Prefer manual SERP analysis (browser-based)
- Use GSC data for most analysis needs

### collect-high-value-serp-data.php

**Purpose:** Collect SERP data for highest-value keywords only

**Location:** `v2/scripts/blog/collect-high-value-serp-data.php`

**Usage:**

```bash
# Collect for top 10 high-value keywords
php v2/scripts/blog/collect-high-value-serp-data.php --limit=10
```

**Data Collected:**

- Top 10 ranking domains
- Domain URLs and titles
- Ranking positions

**Credit Usage:** 100 credits per keyword

**Best Practices:**

- Use only for top 10 keywords (volume > 2000, position 1-5)
- Very expensive - use selectively
- Prefer manual SERP analysis

## Advanced Data Collection Scripts

### collect-post-search-intent.php

**Purpose:** Classify search intent for keywords to align content strategy

**Location:** `v2/scripts/blog/collect-post-search-intent.php`

**Usage:**

```bash
# Single post
php v2/scripts/blog/collect-post-search-intent.php --post=slug --category=category

# All posts
php v2/scripts/blog/collect-post-search-intent.php --all

# With limit
php v2/scripts/blog/collect-post-search-intent.php --all --limit=50
```

**Data Collected:**

- Search intent classification (informational, navigational, transactional)
- Intent alignment with current content

**Output:** `docs/content/blog/posts/{category}/{slug}/data/search-intent.json`

**Credit Usage:** 1 credit per keyword

**Best Practices:**

- Collect for all primary keywords
- Use to align content structure with search intent
- Identify intent mismatches (optimization opportunities)

### collect-post-competitor-analysis.php

**Purpose:** Collect competitor URLs ranking for target keywords and analyze their content structure

**Location:** `v2/scripts/blog/collect-post-competitor-analysis.php`

**Usage:**

```bash
# Single post
php v2/scripts/blog/collect-post-competitor-analysis.php --post=slug --category=category

# With top N competitors
php v2/scripts/blog/collect-post-competitor-analysis.php --post=slug --category=category --top=10

# All posts
php v2/scripts/blog/collect-post-competitor-analysis.php --all
```

**Data Collected:**

- Top ranking competitor URLs
- Competitor content structure
- Competitor headings
- Competitor FAQs

**Output:** `docs/content/blog/posts/{category}/{slug}/data/competitor-analysis.json`

**Credit Usage:** Varies (depends on SERP data collection)

**Best Practices:**

- Use for competitive analysis
- Analyze top 10 competitors
- Use for content gap identification

### collect-faq-research-data.php

**Purpose:** Collect comprehensive FAQ research data (PAA questions, GSC queries, keywords)

**Location:** `v2/scripts/blog/collect-faq-research-data.php`

**Usage:**

```bash
# Single post
php v2/scripts/blog/collect-faq-research-data.php --post=slug --category=category

# All posts
php v2/scripts/blog/collect-faq-research-data.php --all
```

**Data Collected:**

- SISTRIX PAA questions
- GSC top queries
- Target keywords
- LSI keywords
- Search intent data

**Output:** `docs/content/blog/posts/{category}/{slug}/data/faq-research-data.json`

**Best Practices:**

- Use before generating FAQs
- Prioritize PAA questions
- Use GSC queries for FAQ generation

## Domain-Level Collection Scripts

**Recommended cadence:** Run at least **monthly** (or bi-weekly) so [CONTENT_BACKLOG.md](CONTENT_BACKLOG.md), [competitive analysis report](reports/competitive-analysis-YYYY-Q.md), and prioritization stay current. Scripts: `collect-domain-opportunities.php`, `collect-domain-content-ideas.php`, `collect-competitor-keywords.php`. Run `collect-domain-seo-overview.php` **weekly** (~16 cr). See [MONITORING_RUNBOOK.md](MONITORING_RUNBOOK.md) for schedule details.

### collect-domain-level-sistrix.php

**Purpose:** Collect domain-level SISTRIX data (one-time collection, reusable)

**Location:** `v2/scripts/blog/collect-domain-level-sistrix.php`

**Usage:**

```bash
# Collect domain-level data (one-time)
php v2/scripts/blog/collect-domain-level-sistrix.php

# Dry run
php v2/scripts/blog/collect-domain-level-sistrix.php --dry-run
```

**Data Collected:**

- Domain opportunities (keywords where domain could rank)
- Domain competitors (SEO competitors)
- Ranking distribution (position distribution)
- Traffic estimation (domain traffic estimates)
- Domain keywords (top keywords domain ranks for)

**Output:** `docs/content/blog/domain-level-data/sistrix-domain-data.json`

**Total Cost:** ~252 credits (one-time collection)

**Best Practices:**

- Collect once, reuse for all posts
- Update monthly or quarterly
- Reference in all post documentation
- Use for competitive analysis and opportunity identification

### collect-domain-opportunities.php

**Purpose:** Identify keyword opportunities where domain could rank better

**Location:** `v2/scripts/blog/collect-domain-opportunities.php`

**Usage:**

```bash
# Collect domain opportunities
php v2/scripts/blog/collect-domain-opportunities.php --limit=100
```

**Data Collected:**

- Keyword opportunities
- Current position
- Potential gain
- Difficulty level

**Output:** `docs/content/blog/domain-level-data/domain-opportunities.json`

**Credit Usage:** 1 credit per opportunity returned

**Best Practices:**

- Use for quick-win identification
- Prioritize high-gain, low-difficulty opportunities
- **Update monthly or bi-weekly** (with collect-domain-content-ideas and collect-competitor-keywords) so CONTENT_BACKLOG and competitive analysis have fresh data

## Lexikon Inventory Scripts

**Purpose:** Track competitor HR Lexikon terms, Ordio coverage, and SISTRIX performance for content gap prioritization. See [lexikon-inventory/README.md](lexikon-inventory/README.md) for full workflow.

### fetch-sitemap-terms.py

**Purpose:** Fetch competitor lexikon terms from sitemaps; complements scraping for verification and missed-term discovery.

**Location:** `scripts/blog/lexikon-inventory/fetch-sitemap-terms.py`

**Usage:**

```bash
# Full run (all sources with sitemap_url in config)
python3 scripts/blog/lexikon-inventory/fetch-sitemap-terms.py

# Single source
python3 scripts/blog/lexikon-inventory/fetch-sitemap-terms.py --source=hrworks

# Dry run
python3 scripts/blog/lexikon-inventory/fetch-sitemap-terms.py --dry-run
```

**Output:** `docs/content/blog/lexikon-inventory/sitemap/{source_id}-terms.json`

**Credit Usage:** None (free; fetches public sitemaps).

### validate-lexikon-inventory.py

**Purpose:** Sanity checks for config, term files, merged.json; warns if sitemap has significantly more terms than scrape (incomplete scrape).

**Location:** `scripts/blog/lexikon-inventory/validate-lexikon-inventory.py`

**Usage:**

```bash
python3 scripts/blog/lexikon-inventory/validate-lexikon-inventory.py
```

### collect-competitor-lexikon-top-pages.php

**Purpose:** Collect SISTRIX domain.urls for competitor lexikon domains (top-ranking URLs per domain)

**Location:** `v2/scripts/blog/collect-competitor-lexikon-top-pages.php`

**Usage:**

```bash
# Full run (all sources from config)
php v2/scripts/blog/collect-competitor-lexikon-top-pages.php

# Dry run (estimate credits only)
php v2/scripts/blog/collect-competitor-lexikon-top-pages.php --dry-run

# Single domain
php v2/scripts/blog/collect-competitor-lexikon-top-pages.php --domain=personio

# Limit URLs per domain
php v2/scripts/blog/collect-competitor-lexikon-top-pages.php --limit=50
```

**Output:** `docs/content/blog/lexikon-inventory/sistrix/{domain}-top-pages.json`

**Credit Usage:** ~1 credit per URL returned; ~900 credits for 9 domains × 100 URLs. **Run quarterly** to manage credits.

**Related scripts:** `scripts/blog/lexikon-inventory/scrape-competitor-lexikon.py`, `fetch-sitemap-terms.py`, `normalize-and-match-terms.py`, `generate-lexikon-inventory-report.py`, `validate-lexikon-inventory.py`

## Master Collection Scripts

### run-sistrix-gap-fill.php

**Purpose:** Fill missing SISTRIX data (competition-levels, competitor-analysis, search-intent) for posts that lack it

**Location:** `v2/scripts/blog/run-sistrix-gap-fill.php`

**Usage:**

```bash
# Fill all data types
php v2/scripts/blog/run-sistrix-gap-fill.php --data-type=all

# Fill specific type
php v2/scripts/blog/run-sistrix-gap-fill.php --data-type=competition-levels
php v2/scripts/blog/run-sistrix-gap-fill.php --data-type=competitor-analysis
php v2/scripts/blog/run-sistrix-gap-fill.php --data-type=search-intent

# With limit and tier filter
php v2/scripts/blog/run-sistrix-gap-fill.php --data-type=all --limit=20 --tier=1
php v2/scripts/blog/run-sistrix-gap-fill.php --data-type=all --dry-run
```

**When to use:** Monthly audit, post-migration, or after adding new posts. Run `audit-keyword-data-completeness.php` first to identify gaps.

**See:** [SISTRIX_USAGE_AUDIT_REPORT.md](SISTRIX_USAGE_AUDIT_REPORT.md) Gap-fill Runbook

### audit-keyword-data-completeness.php

**Purpose:** Audit SISTRIX data completeness per post; outputs missing files, expansion candidates (0–2 related keywords), credit estimate for gap-fill

**Location:** `v2/scripts/blog/audit-keyword-data-completeness.php`

**Usage:**

```bash
# Stdout report
php v2/scripts/blog/audit-keyword-data-completeness.php

# Write to file
php v2/scripts/blog/audit-keyword-data-completeness.php --output=docs/content/blog/KEYWORD_DATA_AUDIT.md

# JSON output
php v2/scripts/blog/audit-keyword-data-completeness.php --json
```

**When to use:** Before running gap-fill; monthly audit; after adding new posts.

### audit-competitive-depth-usage.py

**Purpose:** Audit competitive-depth consistency across posts; reports where CONTENT_OUTLINE target is missing or below 80% of recommended

**Location:** `v2/scripts/blog/audit-competitive-depth-usage.py`

**Usage:**

```bash
# Stdout report
python3 v2/scripts/blog/audit-competitive-depth-usage.py

# Write to file
python3 v2/scripts/blog/audit-competitive-depth-usage.py --output=docs/content/blog/COMPETITIVE_DEPTH_AUDIT.md
```

**When to use:** After running analyze-competitor-content-depth on multiple posts; before content creation batch.

### run-all-data-collection.php

**Purpose:** Orchestrate all data collections with error handling and progress reporting

**Location:** `v2/scripts/blog/run-all-data-collection.php`

**Usage:**

```bash
# Run all collections for all posts
php v2/scripts/blog/run-all-data-collection.php --all

# Run specific collections
php v2/scripts/blog/run-all-data-collection.php --sistrix --ga4

# With limit
php v2/scripts/blog/run-all-data-collection.php --all --limit=20

# Dry run
php v2/scripts/blog/run-all-data-collection.php --all --dry-run
```

**Features:**

- Sequential execution with rate limiting
- Error handling and reporting
- Credit usage tracking (SISTRIX)
- Progress reporting
- Summary statistics

**Collections Included:**

- GA4 performance data
- GSC search performance data
- SISTRIX keyword data

**Best Practices:**

- Use for batch collection
- Monitor credit usage
- Check error logs
- Review summary statistics

### run-all-advanced-collection.php

**Purpose:** Run all advanced SISTRIX data collections

**Location:** `v2/scripts/blog/run-all-advanced-collection.php`

**Usage:**

```bash
# Run all advanced collections
php v2/scripts/blog/run-all-advanced-collection.php

# Dry run
php v2/scripts/blog/run-all-advanced-collection.php --dry-run

# Skip specific phase
php v2/scripts/blog/run-all-advanced-collection.php --skip-phase=3
```

**Collections Included:**

- SERP features
- Search intent
- Competition levels
- Competitor keywords
- Content ideas
- Domain opportunities
- Backlink analysis

**Best Practices:**

- Use for comprehensive data collection
- Monitor credit usage carefully
- Run monthly or quarterly
- Review collected data

## Validation Scripts

### validate-data-collection.php

**Purpose:** Validate data files exist, are valid JSON, and check data freshness

**Location:** `v2/scripts/blog/validate-data-collection.php`

**Usage:**

```bash
# Validate all posts
php v2/scripts/blog/validate-data-collection.php --all

# Check for stale data (>30 days)
php v2/scripts/blog/validate-data-collection.php --all --stale-days=30

# Single post
php v2/scripts/blog/validate-data-collection.php --post=slug --category=category
```

**Checks:**

- File existence
- JSON validity
- Data freshness (configurable threshold)
- Missing files report

**Best Practices:**

- Run before starting improvement process
- Check data freshness regularly
- Fix missing or stale data

### test-api-access.php

**Purpose:** Test API access for GA4, GSC, and SISTRIX

**Location:** `v2/scripts/blog/test-api-access.php`

**Usage:**

```bash
# Test all APIs
php v2/scripts/blog/test-api-access.php --all

# Test specific API
php v2/scripts/blog/test-api-access.php --ga4
php v2/scripts/blog/test-api-access.php --gsc
php v2/scripts/blog/test-api-access.php --sistrix
```

**Best Practices:**

- Run before data collection
- Verify credentials are working
- Troubleshoot API issues

## Credit Management

### Credit Limits

- **Weekly limit:** 10,000 credits (resets Monday)
- **Daily limit:** 2,000 credits (secondary constraint)
- **Credit tracking:** `v2/data/blog/sistrix-credits-log.json`
- **Cache:** `v2/data/blog/sistrix-cache/` (7-day TTL)

### Credit Usage by Script

- `collect-post-keywords-sistrix.php`: ~5 credits per keyword
- `collect-post-serp-features.php`: 1 credit per keyword
- `collect-post-search-intent.php`: 1 credit per keyword
- `collect-post-competition-levels.php`: 1 credit per keyword
- `collect-post-serp-data.php`: 100 credits per keyword (expensive!)

### Estimated Usage per Post

- **Basic collection:** ~30-35 credits (7 keywords × 5 credits)
- **Full collection:** ~50-60 credits (includes SERP features, intent, competition)

### Credit Optimization Strategies

1. **Use Caching:** 7-day cache reduces duplicate API calls
2. **Batch Processing:** Process posts in batches (e.g., 20-30 per day)
3. **Monitor Usage:** Check `v2/data/blog/sistrix-credits-log.json` regularly
4. **Spread Collection:** Distribute across multiple days if needed
5. **Skip Optional Data:** Skip SERP data if credits are limited

## Quick Reference

### Complete Collection for One Post

```bash
POST_SLUG="your-post-slug"
CATEGORY="ratgeber"

# Collect all data
php v2/scripts/blog/collect-post-performance-ga4.php --post=$POST_SLUG --category=$CATEGORY
php v2/scripts/blog/collect-post-performance-gsc.php --post=$POST_SLUG --category=$CATEGORY
php v2/scripts/blog/collect-post-keywords-sistrix.php --post=$POST_SLUG --category=$CATEGORY
php v2/scripts/blog/collect-post-serp-features.php --post=$POST_SLUG --category=$CATEGORY
php v2/scripts/blog/collect-post-search-intent.php --post=$POST_SLUG --category=$CATEGORY
php v2/scripts/blog/collect-post-competition-levels.php --post=$POST_SLUG --category=$CATEGORY

# Validate
php v2/scripts/blog/validate-data-collection.php --post=$POST_SLUG --category=$CATEGORY
```

**Estimated Time:** 15-30 minutes  
**Estimated Credits:** 30-50 credits

## Related Documentation

- [Blog Post Improvement Process](BLOG_POST_IMPROVEMENT_PROCESS.md) - Complete improvement workflow
- [Improvement Data Collection Guide](IMPROVEMENT_DATA_COLLECTION_GUIDE.md) - Focused collection guide
- [Data Collection Guide](guides/DATA_COLLECTION_GUIDE.md) - Comprehensive collection guide
