# Umlaut Duplicate Prevention

**Last Updated:** 2026-03-14

## Problem

When blog posts are migrated from umlaut slugs (e.g., `spätschicht`) to ASCII slugs (e.g., `spaetschicht`), both JSON files may temporarily exist on production servers. This causes duplicate posts to appear on index pages (`/insights/`) because `load_all_blog_posts()` and `load_all_blog_post_summaries()` scan all JSON files in the posts directory.

## Solution

### 1. Filtering in Post Loaders

Both `load_blog_posts_by_category()` and `load_blog_post_summaries_by_category()` now filter out umlaut slugs that have ASCII redirect mappings:

```php
// Load umlaut redirect mappings to filter out umlaut slugs
$umlaut_redirects = [];
if ($cat === 'lexikon') {
    $redirects_config = @include __DIR__ . '/blog-umlaut-redirects.php';
    if (is_array($redirects_config)) {
        $umlaut_redirects = $redirects_config;
    }
}

foreach ($files as $file) {
    $slug = basename($file, '.json');
    
    // Skip umlaut slugs if they have an ASCII redirect mapping
    if (isset($umlaut_redirects[$slug])) {
        continue; // Skip umlaut file, ASCII version will be loaded instead
    }
    
    $post_data = load_blog_post($cat, $slug);
    // ...
}
```

### 2. URL Deduplication (Safety Measure)

As an additional safety measure, both functions deduplicate posts by URL after loading. The deduplication normalizes umlaut slugs to ASCII before comparison, ensuring that `/insights/lexikon/spätschicht/` and `/insights/lexikon/spaetschicht/` are detected as duplicates:

```php
// Normalize URL: convert umlaut slugs to ASCII for comparison
$normalized_url = rtrim(strtolower($url), '/');
if (preg_match('#/insights/lexikon/([^/]+)/?$#', $normalized_url, $matches)) {
    $slug = $matches[1];
    // If slug is in redirect config (umlaut), convert to ASCII
    if (isset($umlaut_redirects[$slug])) {
        $ascii_slug = $umlaut_redirects[$slug];
        $normalized_url = str_replace("/{$slug}", "/{$ascii_slug}", $normalized_url);
    }
}

// Prefer ASCII version if duplicate found
if (isset($seen_urls[$normalized_url])) {
    if ($is_ascii && !$seen_urls[$normalized_url]['is_ascii']) {
        // Replace umlaut version with ASCII version
        $deduplicated_posts[$index] = $post;
    } else {
        // Skip duplicate
        continue;
    }
}
```

### 3. Cleanup Script

A cleanup script (`v2/scripts/blog/cleanup-umlaut-post-files.php`) removes umlaut JSON files that have been migrated to ASCII:

```bash
# Dry run (check what would be deleted)
php v2/scripts/blog/cleanup-umlaut-post-files.php --dry-run

# Actually delete umlaut files
php v2/scripts/blog/cleanup-umlaut-post-files.php
```

**Note:** The script only deletes umlaut files if the corresponding ASCII file exists (ensuring migration is complete).

## Files Modified

- `v2/config/blog-template-helpers.php`:
  - `load_blog_posts_by_category()` - Added umlaut filtering and URL deduplication
  - `load_blog_post_summaries_by_category()` - Added umlaut filtering and URL deduplication

## Testing

After deployment, verify:

1. **No duplicates on index page:**
   ```bash
   php -r "require 'v2/config/blog-template-helpers.php'; \$posts = load_all_blog_post_summaries(); \$urls = array_map(fn(\$p) => \$p['url'] ?? '', \$posts); \$dupes = array_filter(array_count_values(\$urls), fn(\$c) => \$c > 1); echo empty(\$dupes) ? 'OK' : 'DUPLICATES FOUND';"
   ```

2. **Only ASCII URLs present:**
   ```bash
   php -r "require 'v2/config/blog-template-helpers.php'; \$posts = load_all_blog_post_summaries(); foreach (\$posts as \$p) { if (preg_match('/[äöüß]/u', \$p['url'] ?? '')) { echo 'UMLAUT URL: ' . \$p['url'] . PHP_EOL; } }"
   ```

3. **Cleanup script identifies umlaut files:**
   ```bash
   php v2/scripts/blog/cleanup-umlaut-post-files.php --dry-run
   ```

## Related

- `docs/systems/landing-page-redirects/LEXIKON_UMLAUT_REDIRECTS.md` - Redirect configuration
- `docs/systems/landing-page-redirects/SLUG_REDIRECT_DUPLICATE_PREVENTION.md` - Similar pattern for slug renames (year-specific → evergreen)
- `v2/scripts/blog/migrate-slug-to-ascii.php` - Migration script
- `v2/scripts/blog/cleanup-umlaut-post-files.php` - Cleanup script
