# Malformed and Spammy Links Audit & Fix Summary

**Last Updated:** 2026-01-10

## Summary

Comprehensive audit and cleanup of malformed and spammy links in blog posts. All problematic links have been identified and removed.

## Issues Found

### 1. Malformed Links (Multiple URLs Concatenated) ⚠️
- **Found:** 1 link
- **Issue:** Link with multiple URLs concatenated (e.g., `https://www.ordio.com/dokumentenmanagement?utm_...https://www.ordio.com/digitale-personalakte?utm_...`)
- **Location:** `urlaubsanspruch-von-minijobbern.json`
- **Status:** ✅ Fixed (removed)

### 2. Spammy Links (Excessive UTM Parameters) ⚠️
- **Found:** 76 links
- **Issue:** Links with more than 3 UTM parameters (typically 5: `utm_campaign`, `utm_source`, `utm_medium`, `utm_term`, `utm_content`)
- **Examples:**
  - `https://www.ordio.com/lp?utm_campaign=inbound&utm_source=organicsearch&utm_medium=lexikon&utm_term=&utm_content=`
  - `https://www.ordio.com/dokumentenmanagement?utm_campaign=inbound&utm_source=organicsearch&utm_medium=lexikon&utm_term=&utm_content=`
- **Status:** ✅ Fixed (all 76 links removed)

### 3. Irrelevant Links (Anchor Text Doesn't Match URL)
- **Found:** 0 links
- **Status:** ✅ No issues found

## Fix Implementation

### Scripts Created

1. **`audit-malformed-links.php`**
   - Scans all blog posts for malformed and spammy links
   - Identifies multiple concatenated URLs
   - Detects excessive UTM parameters (>3)
   - Checks for irrelevant anchor text
   - Generates comprehensive audit report

2. **`fix-malformed-links.php`**
   - Removes malformed links (multiple URLs concatenated)
   - Removes spammy links (excessive UTM parameters)
   - Cleans URLs (removes excessive UTM, keeps max 2)
   - Updates both HTML content and `internal_links` array
   - Generates fix report

### Fix Results

- **Posts Processed:** 38
- **Links Fixed:** 0 (malformed links were removed entirely)
- **Links Removed:** 76
- **Total Issues Resolved:** 77

## Link Patterns Removed

### Pattern 1: Multiple URLs Concatenated
```
https://www.ordio.com/dokumentenmanagement?utm_...https://www.ordio.com/digitale-personalakte?utm_...
```
**Action:** Removed entirely (anchor text kept as plain text)

### Pattern 2: Excessive UTM Parameters
```
https://www.ordio.com/lp?utm_campaign=inbound&utm_source=organicsearch&utm_medium=lexikon&utm_term=&utm_content=
```
**Action:** Removed entirely (anchor text kept as plain text)

## Prevention

### Extraction Script (`extract-content.py`)
- Currently extracts links as-is from WordPress
- **Recommendation:** Add UTM parameter cleaning during extraction
- **Recommendation:** Add validation for malformed URLs

### Sanitization (`sanitizeHtmlOutput()`)
- Currently sanitizes HTML but doesn't clean UTM parameters
- **Recommendation:** Add UTM parameter cleaning (keep max 2: `utm_source`, `utm_medium`)
- **Recommendation:** Add validation for malformed URLs

### Link Insertion (`add-links-to-json.php`)
- Already validates anchor text quality
- **Recommendation:** Add URL validation to prevent malformed URLs
- **Recommendation:** Add UTM parameter limit (max 2)

## Files Modified

1. **38 blog post JSON files** - Links removed from HTML and `internal_links` arrays
2. **Audit report:** `docs/content/blog/MALFORMED_LINKS_AUDIT.md`
3. **Fix report:** `docs/content/blog/MALFORMED_LINKS_FIXED.md`

## Validation

After fix:
- ✅ 0 malformed links (multiple URLs concatenated)
- ✅ 0 spammy links (excessive UTM parameters)
- ✅ 0 broken encoding issues
- ✅ 0 empty anchor text links

## Recommendations

1. **Update Extraction Script:** Add UTM parameter cleaning during WordPress extraction
2. **Update Sanitization:** Add UTM parameter limit (max 2) in `sanitizeHtmlOutput()`
3. **Add Validation:** Validate URLs before adding to `internal_links` array
4. **Regular Audits:** Run `audit-malformed-links.php` monthly to catch new issues
5. **Link Quality Standards:** Document acceptable UTM parameter usage (max 2: `utm_source`, `utm_medium`)

## Related Documentation

- [Internal Linking Guide](./INTERNAL_LINKING_GUIDE.md)
- [Anchor Text Quality Guide](./ANCHOR_TEXT_QUALITY_GUIDE.md)
- [Linking Quality Fix Complete](./LINKING_QUALITY_FIX_COMPLETE.md)
