# SISTRIX Cache Management Guide

**Last Updated:** 2026-01-15

## Overview

The SISTRIX cache (`v2/data/blog/sistrix-cache/`) stores API responses to:

- **Save API credits** (avoid redundant calls)
- **Speed up collection** (skip API calls for cached data)
- **Reduce API rate limiting** (fewer requests)

## Cache Statistics

**Current Status:**

- Total files: ~1,352 files
- Total size: ~2.08 MB
- Cache expiration: 30 days (keywords/PAA), 7 days (rankings)

## Do We Need These Files?

**✅ YES - Keep the cache:**

1. **Credit Savings:** Cache prevents redundant API calls, saving credits
2. **Performance:** Cached data loads instantly vs. API calls (seconds)
3. **Small Size:** Only 2 MB total - negligible disk space
4. **Already Ignored:** Added to `.gitignore` - won't clutter Git

**When Cache is Used:**

- Before making API calls, scripts check cache first
- If cache exists and is valid (< 30 days), use cached data
- If cache expired or missing, make API call and cache result

## Cache Expiration

**Expiration Times:**

- **Keywords:** 30 days (relatively stable data)
- **PAA Questions:** 30 days (questions change slowly)
- **Rankings:** 7 days (positions change more frequently)

**Why 30 Days?**

- Keyword metrics (volume, difficulty, competition) are relatively stable
- PAA questions don't change frequently
- Longer cache = more credit savings
- Rankings cached shorter (7 days) as positions change more often

## Cleanup Options

### Option 1: Manual Cleanup (Recommended)

**Run cleanup script periodically:**

```bash
# Dry run (see what would be deleted)
php v2/scripts/blog/cleanup-sistrix-cache.php --dry-run

# Delete expired files (30+ days old)
php v2/scripts/blog/cleanup-sistrix-cache.php

# Delete all cache files (use with caution)
php v2/scripts/blog/cleanup-sistrix-cache.php --all
```

**When to Run:**

- Monthly: Clean up expired files
- Before major collection: Ensure cache is fresh
- If disk space is an issue: Delete all and rebuild

### Option 2: Automatic Cleanup

**Add to cron (optional):**

```bash
# Run cleanup weekly (Sundays at 2 AM)
0 2 * * 0 cd /path/to/project && php v2/scripts/blog/cleanup-sistrix-cache.php
```

**Or integrate into collection script:**

- Add cleanup step before/after collection runs
- Only delete files older than expiration time

### Option 3: Keep Current Setup

**Current approach is fine:**

- Cache files are small (2 MB total)
- All files are recent (< 30 days)
- Scripts automatically skip expired cache
- No manual intervention needed

## Cache File Structure

**Example cache file:**

```json
{
    "keyword": "vertrauensarbeitszeit",
    "data": {
        "keyword": "vertrauensarbeitszeit",
        "volume": 3450,
        "difficulty": 50,
        "competition": 50,
        ...
    },
    "cached_at": 1768500289
}
```

**Cache Key Format:**

- Keywords: `md5("sistrix_keyword_{keyword}_ordio.com_de")`
- PAA: `md5("sistrix_paa_{keyword}_de_de")`

## Best Practices

### ✅ DO:

1. **Keep cache files** - They save credits and improve performance
2. **Run cleanup monthly** - Remove expired files to keep cache fresh
3. **Monitor cache size** - Check if it grows unexpectedly
4. **Use cache status script** - Check cache hit rate before collection

### ❌ DON'T:

1. **Delete cache manually** - Use cleanup script instead
2. **Commit cache to Git** - Already in `.gitignore`
3. **Disable caching** - Would waste API credits
4. **Set expiration too short** - Would reduce credit savings

## Cache Status Check

**Check cache status before collection:**

```bash
php v2/scripts/blog/check-sistrix-cache-status.php
```

**Output shows:**

- Cache hit rate for each post
- Estimated credits needed (cache-aware)
- Which posts need fresh data

## Recommendations

**For Current Setup:**

1. ✅ **Keep cache as-is** - Small size, high value
2. ✅ **Run cleanup monthly** - Remove expired files
3. ✅ **Monitor cache size** - Should stay under 10 MB
4. ✅ **Use cache status** - Before major collections

**If Cache Grows Too Large:**

1. Run cleanup: `php v2/scripts/blog/cleanup-sistrix-cache.php`
2. Check for duplicate keywords (shouldn't happen)
3. Consider reducing expiration time (not recommended)
4. Delete all and rebuild: `php v2/scripts/blog/cleanup-sistrix-cache.php --all`

## Summary

**Current Status:** ✅ **Cache is healthy and useful**

- Small size (2 MB)
- All files recent (< 30 days)
- Saves API credits
- Improves performance
- Already ignored in Git

**Action Items:**

- ✅ Cache already in `.gitignore`
- ✅ Cleanup script available
- ⚠️ Run cleanup monthly (optional)
- ⚠️ Monitor cache size (optional)

**Conclusion:** Keep the cache. It's small, useful, and saves credits. Run cleanup monthly to remove expired files.
