# Troubleshooting Data Collection Guide

**Last Updated:** 2026-01-11

## Overview

This guide helps diagnose and fix common issues with GSC, GA4, and SISTRIX data collection.

## Common Issues

### Issue 1: GSC Data Shows Zeros Despite Known Traffic

**Symptoms:**
- All posts show 0 clicks, 0 impressions in GSC data
- GA4 shows traffic but GSC shows zeros
- Documentation shows placeholder zeros

**Root Causes:**
1. **Wrong Site URL Format** - Using `sc_domain:ordio.com` instead of `https://www.ordio.com/`
2. **URL Format Mismatch** - GSC expects exact URL match with trailing slash
3. **Silent API Errors** - Exceptions caught but not logged

**Solution:**

1. **Verify Site Property Format:**
   ```bash
   php v2/scripts/blog/test-gsc-debug.php --post=zuschlage-berechnen-rechner --category=ratgeber --verbose
   ```
   This will show the correct site URL format and test URL variations.

2. **Check Error Logs:**
   ```bash
   cat v2/data/blog/gsc-collection-errors.log
   ```

3. **Re-run Collection:**
   ```bash
   php v2/scripts/blog/collect-post-performance-gsc.php --all
   ```

**Prevention:**
- The collection script now automatically detects the correct site property
- Error logging is enabled by default
- URL format matching includes fallback strategies

### Issue 2: GA4 Date Range Data Missing

**Symptoms:**
- Only `last_90_days` data present, `last_year` shows zeros
- Date ranges not mapped correctly

**Root Causes:**
1. **Hardcoded Date Range Index** - Script only processed first date range
2. **Incorrect Row Mapping** - GA4 returns one row per date range, not one row with multiple ranges

**Solution:**

1. **Verify Date Range Mapping:**
   ```bash
   php v2/scripts/blog/test-ga4-debug.php --post=zuschlage-berechnen-rechner --category=ratgeber --verbose
   ```
   This will show how GA4 returns data for multiple date ranges.

2. **Check Collected Data:**
   ```bash
   cat docs/content/blog/posts/ratgeber/zuschlage-berechnen-rechner/data/performance-ga4.json | jq '.metrics'
   ```

3. **Re-run Collection:**
   ```bash
   php v2/scripts/blog/collect-post-performance-ga4.php --all
   ```

**Prevention:**
- Date range mapping now correctly handles multiple ranges
- Each row is mapped to its corresponding date range index
- Error logging added for debugging

### Issue 3: API Access Errors

**Symptoms:**
- "Failed to initialize Google API client"
- "Credentials file not found"
- "Permission denied" errors

**Solution:**

1. **Verify Credentials:**
   ```bash
   php v2/scripts/blog/test-api-access.php --all
   ```

2. **Check Credentials File:**
   ```bash
   ls -la v2/config/google-api-credentials.json
   ```

3. **Verify Service Account Permissions:**
   - GSC: Must have "Full" or "Restricted" access
   - GA4: Must have "Viewer" or higher role

### Issue 4: Data Not Appearing in Documentation

**Symptoms:**
- Data files exist with correct data
- Documentation still shows zeros or placeholders

**Solution:**

1. **Verify Data Files:**
   ```bash
   cat docs/content/blog/posts/{category}/{slug}/data/performance-gsc.json | jq '.metrics.last_90_days'
   cat docs/content/blog/posts/{category}/{slug}/data/performance-ga4.json | jq '.metrics.last_90_days'
   ```

2. **Regenerate Documentation:**
   ```bash
   php v2/scripts/blog/generate-automated-reports.php --post={slug} --category={category}
   php v2/scripts/blog/generate-post-documentation.php --post={slug} --category={category}
   ```

3. **Check Data Flow:**
   - Verify JSON files are being loaded correctly
   - Check placeholder replacement logic
   - Ensure data structure matches expected format

## Diagnostic Tools

### GSC Debug Script

```bash
php v2/scripts/blog/test-gsc-debug.php --post={slug} --category={category} --verbose
```

**What it does:**
- Tests site property access
- Tests different URL format variations
- Tests unfiltered queries to find actual URL format
- Tests different date ranges
- Logs all API responses and errors

**Output:**
- Console output with test results
- Detailed log file: `v2/data/blog/gsc-debug.log`

### GA4 Debug Script

```bash
php v2/scripts/blog/test-ga4-debug.php --post={slug} --category={category} --verbose
```

**What it does:**
- Tests multiple date ranges in single request
- Verifies correct mapping of results to date ranges
- Shows response structure and row mapping

**Output:**
- Console output with date range analysis
- Detailed log file: `v2/data/blog/ga4-debug.log`

### Data Quality Validation

```bash
php v2/scripts/blog/validate-api-data-quality.php --all
```

**What it does:**
- Checks for posts with zero GSC but non-zero GA4
- Flags posts with suspicious zeros
- Validates data freshness
- Generates validation report

**Output:**
- Validation report: `docs/content/blog/DATA_QUALITY_VALIDATION_REPORT.md`

## Error Log Locations

| Service | Log File | Location |
|----------|----------|----------|
| **GSC** | `gsc-collection-errors.log` | `v2/data/blog/` |
| **GA4** | `ga4-collection-errors.log` | `v2/data/blog/` |
| **GSC Debug** | `gsc-debug.log` | `v2/data/blog/` |
| **GA4 Debug** | `ga4-debug.log` | `v2/data/blog/` |

## API-Specific Troubleshooting

### Google Search Console

**Site URL Format:**
- ✅ Correct: `https://www.ordio.com/` (URL prefix property)
- ❌ Wrong: `sc_domain:ordio.com` (domain property - not configured)

**URL Matching:**
- ✅ Correct: `https://www.ordio.com/insights/ratgeber/post-slug/`
- ❌ Wrong: `https://www.ordio.com/insights/ratgeber/post-slug` (no trailing slash)
- ❌ Wrong: `http://www.ordio.com/insights/ratgeber/post-slug/` (http instead of https)

**Common Errors:**
- `400 Bad Request` - Usually means wrong site URL format or invalid filter
- `403 Forbidden` - Permission issue, check service account access
- `No rows returned` - URL format mismatch or no data for that URL

### Google Analytics 4

**Date Range Mapping:**
- GA4 returns **one row per date range**
- Row 0 = first date range (last_90_days)
- Row 1 = second date range (last_year)
- Each row contains all metrics for that date range

**Common Errors:**
- `400 Bad Request` - Invalid property ID or date range
- `403 Forbidden` - Permission issue, check service account role
- Missing `last_year` data - Date range mapping issue (now fixed)

### SISTRIX

**Credit Management:**
- Weekly limit: 10,000 credits (resets Monday)
- Daily limit: 2,000 credits (secondary constraint)
- Monitor usage: `v2/data/blog/sistrix-credits-log.json`

**Common Errors:**
- `401 Unauthorized` - Invalid API key
- `429 Too Many Requests` - Rate limit exceeded
- `Credit limit reached` - Weekly limit exceeded

## Quick Fixes

### Re-collect GSC Data for All Posts

```bash
php v2/scripts/blog/collect-post-performance-gsc.php --all
```

### Re-collect GA4 Data for All Posts

```bash
php v2/scripts/blog/collect-post-performance-ga4.php --all
```

### Validate Data Quality

```bash
php v2/scripts/blog/validate-api-data-quality.php --all
```

### Regenerate All Documentation

```bash
php v2/scripts/blog/generate-automated-reports.php --all
php v2/scripts/blog/generate-post-documentation.php --all
```

## Prevention Best Practices

1. **Run Validation Weekly:**
   ```bash
   php v2/scripts/blog/validate-api-data-quality.php --all
   ```

2. **Monitor Error Logs:**
   - Check `v2/data/blog/gsc-collection-errors.log` regularly
   - Check `v2/data/blog/ga4-collection-errors.log` regularly

3. **Test Before Full Collection:**
   - Use `--limit=5` to test on sample posts first
   - Verify results before running `--all`

4. **Keep Credentials Updated:**
   - Verify API access monthly: `php v2/scripts/blog/test-api-access.php --all`
   - Rotate credentials if needed

## Getting Help

If issues persist:

1. **Check Error Logs:**
   ```bash
   tail -50 v2/data/blog/gsc-collection-errors.log
   tail -50 v2/data/blog/ga4-collection-errors.log
   ```

2. **Run Diagnostic Scripts:**
   ```bash
   php v2/scripts/blog/test-gsc-debug.php --post={slug} --category={category} --verbose
   php v2/scripts/blog/test-ga4-debug.php --post={slug} --category={category} --verbose
   ```

3. **Review Documentation:**
   - [Data Collection Guide](DATA_COLLECTION_GUIDE.md)
   - [Data Quality Dashboard](DATA_QUALITY_DASHBOARD.md)

4. **Check API Status:**
   - GSC: https://status.search.google.com/
   - GA4: https://status.cloud.google.com/
   - SISTRIX: Check API dashboard for status

## Related Documentation

- [Data Collection Guide](DATA_COLLECTION_GUIDE.md)
- [Data Quality Dashboard](DATA_QUALITY_DASHBOARD.md)
- [Validation Report](DATA_QUALITY_VALIDATION_REPORT.md)
