# Anchor Text and UTM Parameter Fix

**Last Updated:** 2026-01-14

## Summary

Comprehensive fix for anchor text quality issues and UTM parameter removal from internal links. This fix addresses links with problematic anchor text (starting with "und", incomplete compounds) and removes UTM parameters from internal links to public pages.

## Problems Fixed

### 1. Anchor Text Starting with "und" ✅

- **Issue**: Links like "und produzierendes Gewerbe" violate SEO best practices
- **Solution**: Automatically expand anchor text to include preceding context
- **Example**: "und produzierendes Gewerbe" → "Industrie und produzierendes Gewerbe"
- **Result**: 2+ problematic links fixed across posts

### 2. Incomplete Compound Links ✅

- **Issue**: Links like "Pflege- und" are incomplete and confusing
- **Solution**: Automatically expand to include completing word from context
- **Example**: "Pflege- und" → "Pflege- und Gesundheitswesen"
- **Result**: Incomplete compounds fixed automatically

### 3. UTM Parameters on Internal Links ✅

- **Issue**: Internal links to public pages had UTM parameters (spammy appearance)
- **Solution**: Remove ALL UTM parameters from internal links to `/branchen/*`, `/tools/*`, `/insights/*`
- **Exception**: Keep UTM parameters on landing pages (`/lp/*`) for tracking
- **Result**: 436 UTM parameters removed from 70 posts

### 4. Duplicate Words from Anchor Text Expansion ✅

- **Issue**: When anchor text is expanded (e.g., "und produzierendes Gewerbe" → "Industrie und produzierendes Gewerbe"), the original word ("Industrie") remains before the link, causing duplicates like "Industrie Industrie und produzierendes Gewerbe"
- **Solution**: Automatically detect and remove duplicate words that appear before or after links when anchor text is expanded
- **Example**: "Industrie <a>Industrie und produzierendes Gewerbe</a>" → "<a>Industrie und produzierendes Gewerbe</a>"
- **Result**: Duplicate words automatically removed during anchor text expansion

### 5. Landing Page Links (`/lp/*`) ✅

- **Issue**: Landing page links (`/lp/*`) appeared in blog post content, which is inappropriate for evergreen content
- **Solution**: Remove all `/lp/*` links entirely from blog posts, keeping anchor text as plain text
- **Reason**: Landing pages are for specific campaigns and should not appear in evergreen blog content
- **Result**: All `/lp/*` links removed from blog posts

## Implementation Details

### New Functions

**`expandAnchorTextWithContext($html, $anchorText, $url)`** (`link_utils.php`):

- Expands anchor text by searching surrounding HTML context
- Handles two cases:
  1. Anchor starting with "und": Searches backward for preceding word/phrase
  2. Incomplete compound: Searches forward for completing word
- Returns array with:
  - `expanded`: The expanded anchor text
  - `remove_before`: Word to remove before the link (if duplicate detected)
  - `remove_after`: Word to remove after the link (if duplicate detected)
- Returns `null` if expansion not possible

**`removeUtmFromInternalLink($url)`** (`link_utils.php`):

- Checks if URL is internal link to public page
- Removes ALL UTM parameters (`utm_*`) if conditions met
- Preserves UTM parameters on landing pages (`/lp/*`)
- Returns cleaned URL

### Updated Scripts

**`fix-anchor-text-stop-words.php`**:

- Now expands context for links starting with "und" before trimming
- Removes links that cannot be fixed (instead of skipping)
- Applies UTM parameter removal

**`fix-anchor-text-and-utm.php`**:

- Comprehensive script combining anchor text and UTM fixes
- Processes all blog posts
- Creates backups before modifications
- Generates detailed change log
- Options: `--all`, `--post=slug --category=category`, `--dry-run`, `--backup`

**`fix-all-internal-linking-issues.php`** (NEW):

- Comprehensive script fixing all three issues:
  1. Duplicate words from anchor text expansion
  2. UTM parameters on internal links
  3. Landing page links (`/lp/*`)
- Processes all blog posts
- Creates backups before modifications
- Generates detailed change log
- Options: `--all`, `--post=slug --category=category`, `--dry-run`, `--backup`

**`sanitizeHtmlOutput()`** (`blog-template-helpers.php`):

- Applies UTM removal during content sanitization
- Uses `removeUtmFromInternalLink()` if available

**`cleanUrl()`** functions (`add-links-to-json.php`, `fix-malformed-links.php`):

- Updated to use `removeUtmFromInternalLink()` function
- Consistent UTM removal logic across all scripts

## Usage

### Fix Single Post

```bash
php v2/scripts/blog/fix-anchor-text-and-utm.php --post=tarifvertraege --category=lexikon --backup
```

### Fix All Posts

```bash
php v2/scripts/blog/fix-anchor-text-and-utm.php --all --backup
```

Or use the comprehensive fix script:

```bash
php v2/scripts/blog/fix-all-internal-linking-issues.php --all --backup
```

### Dry Run (Preview Changes)

```bash
php v2/scripts/blog/fix-anchor-text-and-utm.php --all --dry-run
```

Or:

```bash
php v2/scripts/blog/fix-all-internal-linking-issues.php --all --dry-run
```

## Results

**Run Date**: 2026-01-14

- **Posts Processed**: 99
- **Posts Modified**: 70
- **Links Fixed**: 880
- **Links Removed**: 0
- **UTM Parameters Removed**: 436

## Best Practices

### Anchor Text

- ✅ **DO**: Use descriptive, keyword-rich anchor text
- ✅ **DO**: Include complete phrases (e.g., "Industrie und produzierendes Gewerbe")
- ❌ **DON'T**: Start anchor text with stop words (e.g., "und", "oder")
- ❌ **DON'T**: Use incomplete compounds (e.g., "Pflege- und")

### UTM Parameters

- ✅ **DO**: Use UTM parameters on landing pages (`/lp/*`) for tracking
- ✅ **DO**: Remove UTM parameters from internal links to public pages
- ❌ **DON'T**: Add UTM parameters to internal links (`/branchen/*`, `/tools/*`, `/insights/*`)

### Landing Page Links

- ❌ **DON'T**: Use `/lp/*` links in blog post content
- ✅ **DO**: Use public pages (`/branchen/*`, `/tools/*`, `/insights/*`) instead
- **Reason**: Landing pages are for specific campaigns, not evergreen content

### Duplicate Word Prevention

- ✅ **DO**: Let the system automatically expand anchor text when needed
- ✅ **DO**: Trust the system to remove duplicate words automatically
- ❌ **DON'T**: Manually add words before/after links that match the anchor text

## Prevention

All new links added through `add-links-to-json.php` and other scripts will automatically:

- Validate anchor text quality
- Remove UTM parameters from internal public page links
- Expand context for problematic anchor texts

## Related Documentation

- `docs/content/blog/LINKING_QUALITY_FIX_FINAL.md` - Overall linking quality improvements
- `.cursor/rules/blog-templates.mdc` - Blog template patterns and best practices
- `v2/scripts/blog/link_utils.php` - Link utility functions
