# Blog Data Freshness and Validation Summary

**Last Updated:** 2026-02-08  
**Generated:** 2026-02-08

Summary of `check-data-freshness.php`, `validate-data-collection.php`, and `validate-api-data-quality.php` runs for audit reference.

## Freshness Check (7-day threshold)

- **Command:** `php v2/scripts/blog/check-data-freshness.php --all --max-age=7`
- **Total files checked:** 588
- **Fresh:** 0
- **Stale:** 497
- **Missing:** 91

Many post data files are older than 7 days. To refresh Tier 1 only:  
`php v2/scripts/blog/check-data-freshness.php --tier=1 --max-age=7 --auto-refresh`

## Data Collection Validation (30-day stale threshold)

- **Command:** `php v2/scripts/blog/validate-data-collection.php --all --stale-days=30`
- **SISTRIX:** 98/108 files found, 98 valid, 0 stale (>30d), 10 missing
- **GA4:** 98/108 files found, 98 valid, 0 stale (>30d), 10 missing
- **GSC:** 98/108 files found, 98 valid, 0 stale (>30d), 10 missing

Missing files are for placeholder slugs (e.g. `{erschwerniszulage}`) and a few specific posts (gastronomie-mindestlohn, zeiterfassung-excel-*).

## API Data Quality Validation

- **Command:** `php v2/scripts/blog/validate-api-data-quality.php --all`
- **Processed:** 108 posts
- **Issues:** 196 stale data, 20 missing files, 0 zero-GSC/non-zero-GA4

Full report: [DATA_QUALITY_VALIDATION_REPORT.md](DATA_QUALITY_VALIDATION_REPORT.md)

## Recommendations

1. Run weekly-priority-refresh (or GA4/GSC collection) to refresh performance data; then re-run freshness check.
2. Tier 1 auto-refresh: `check-data-freshness.php --tier=1 --max-age=7 --auto-refresh` when API credits allow.
3. Address missing data for non-placeholder slugs (gastronomie-mindestlohn, zeiterfassung-excel-*) if those posts are live.
4. **Tier 1 missing serp-features.json:** ~~13 Tier 1 posts lack `serp-features.json`.~~ **Resolved (2026-02-08):** All 13 filled via `collect-post-serp-features.php --post=slug --category=category --limit=15` per post. Script now supports per-post mode.
