# Blog Posts Spam URL Cleanup Summary

**Last Updated:** 2026-01-09

## Overview

This document summarizes the spam URL cleanup process for blog posts, including removal of UTM parameters, tracking attributes, and other spam-like URLs.

## Cleanup Process

### Scripts Created

1. **`scripts/blog/audit-spam-urls.php`**

   - Scans all blog posts for spam URLs
   - Identifies UTM parameters, data-mil attributes, localhost URLs
   - Generates detailed report

2. **`scripts/blog/clean-spam-urls.php`**

   - Removes UTM parameters from URLs
   - Removes data-mil tracking attributes
   - Cleans localhost URLs
   - Regenerates text content if missing

3. **`scripts/blog/regenerate-text-content.php`**
   - Regenerates text content from HTML for posts with missing/incomplete text
   - Updates word counts

### What Was Cleaned

- **UTM Parameters**: Removed `utm_campaign`, `utm_source`, `utm_medium`, `utm_term`, `utm_content` from URLs
- **Tracking Attributes**: Removed `data-mil` attributes from anchor tags
- **Localhost URLs**: Cleaned localhost URLs to relative paths
- **Text Content**: Regenerated missing text content and word counts

### Safety Features

- Content length validation to prevent accidental content removal
- Dry-run mode for testing before applying changes
- Backup validation before updates
- Selective post processing support

## Usage

### Audit Spam URLs

```bash
php scripts/blog/audit-spam-urls.php
```

### Clean Spam URLs (Dry Run)

```bash
php scripts/blog/clean-spam-urls.php --dry-run
```

### Clean Spam URLs (Apply Changes)

```bash
php scripts/blog/clean-spam-urls.php
```

### Clean Specific Post

```bash
php scripts/blog/clean-spam-urls.php --post=post-slug
```

### Regenerate Text Content

```bash
php scripts/blog/regenerate-text-content.php
```

## Results

- All UTM parameters removed from blog post URLs
- All data-mil tracking attributes removed
- Text content regenerated for posts with missing content
- Word counts updated
- No content loss during cleanup

## Best Practices

1. **Always run audit first** to see what will be cleaned
2. **Use dry-run mode** to test before applying changes
3. **Verify content integrity** after cleanup
4. **Browser test** sample posts to ensure links work correctly
5. **Keep backups** before running cleanup scripts

## Related Documentation

- `docs/content/blog/LINK_CLEANUP_REPORT.md` - Broken links cleanup report
- `docs/content/blog/SPAM_URL_CLEANUP_REPORT.md` - Detailed spam URL audit report
