# Orphaned Anchor Text Fix

**Last Updated:** 2026-01-10

Documentation of the fix for orphaned anchor text that remains in content after links are removed.

## Important Note: Regex Bug Fixed

During initial implementation, a Python regex bug caused "$1" artifacts to appear in content. This was fixed by:
- Using `r'\1'` instead of `r'$1'` for Python regex backreferences
- Re-extracting all posts to ensure clean content
- Re-applying links cleanly

See [Re-extraction and Re-linking Complete](./RE_EXTRACTION_AND_LINKING_COMPLETE.md) for full details.

## Problem

When problematic links are removed from blog post content, the `fix-problematic-links.php` script was replacing link HTML with just the anchor text. This left awkward orphaned text like "mehr zum Dienstplan" that doesn't make sense contextually.

**Example:**
```html
<!-- Before fix -->
<p><strong>Ein Beispiel aus der Praxis:</strong> mehr zum Dienstplan</p>

<!-- After fix -->
<p><strong>Ein Beispiel aus der Praxis:</strong></p>
```

## Solution

Enhanced the link removal logic to detect when anchor text should be removed entirely vs kept as natural text.

### Detection Patterns

The system identifies orphaned anchor text using these patterns:

1. **"mehr zum/zur" patterns**: Always removed when found
   - Pattern: `/\b(mehr zum|mehr zur)\b/ui`
   - Context: After section titles, colons, or as standalone text

2. **Single words at end of sentences**: Removed when awkward
   - Pattern: Single word after sentence-ending punctuation
   - Common examples: "Checkliste", "Dienstplan", "Zeiterfassung"

3. **Text after colons**: Removed when it's awkward standalone text
   - Pattern: Text after `<strong>Title:</strong>` that starts with "mehr zum/zur"

4. **Section title patterns**: Removed after "Ein Beispiel aus der Praxis" type headings

### Implementation

**Function:** `shouldRemoveOrphanedAnchorText($anchorText, $context, $htmlContent)`

Located in: `v2/scripts/blog/fix-problematic-links.php`

**Logic:**
```php
if (shouldRemoveOrphanedAnchorText($anchorText, $context, $htmlContent)) {
    // Remove link AND anchor text entirely
    $htmlContent = str_replace($toRemove['html'], '', $htmlContent);
    // Clean up extra spaces and punctuation
    $htmlContent = preg_replace('/\s+/', ' ', $htmlContent);
} else {
    // Keep anchor text (natural text)
    $htmlContent = str_replace($toRemove['html'], $anchorText, $htmlContent);
}
```

## Tools Created

### 1. `identify-orphaned-text.php`

Scans blog posts to identify orphaned anchor text patterns.

**Usage:**
```bash
# Check specific post
php v2/scripts/blog/identify-orphaned-text.php inside-ordio/product-updates-q4-2024

# Check all posts
php v2/scripts/blog/identify-orphaned-text.php
```

**Output:** Lists posts with orphaned text and the patterns found.

### 2. `remove-orphaned-text.php`

Removes orphaned anchor text patterns from existing posts.

**Usage:**
```bash
# Fix specific post
php v2/scripts/blog/remove-orphaned-text.php inside-ordio/product-updates-q4-2024

# Fix all posts
php v2/scripts/blog/remove-orphaned-text.php
```

**Features:**
- Removes "mehr zum/zur" patterns
- Removes single words at end of paragraphs
- Cleans up punctuation and spacing
- Updates JSON files automatically

### 3. Enhanced `fix-problematic-links.php`

Now automatically removes orphaned text when fixing problematic links.

**Enhancements:**
- Added `shouldRemoveOrphanedAnchorText()` function
- Updated link removal logic to check for orphaned text
- Cleans up spacing and punctuation after removal

## Results

**Before Fix:**
- "mehr zum Dienstplan" appeared as plain text after section titles
- "Checkliste" appeared at end of sentences
- Awkward orphaned text throughout content

**After Fix:**
- All orphaned anchor text removed
- Content flows naturally
- No awkward standalone text
- Natural links remain intact

## Testing

1. **Sample Post Test:**
   ```bash
   php v2/scripts/blog/fix-problematic-links.php inside-ordio/product-updates-q4-2024
   php v2/scripts/blog/identify-orphaned-text.php inside-ordio/product-updates-q4-2024
   ```

2. **Validation:**
   ```bash
   php v2/scripts/blog/validate-link-quality.php inside-ordio/product-updates-q4-2024
   ```

3. **All Posts:**
   ```bash
   php v2/scripts/blog/remove-orphaned-text.php
   php v2/scripts/blog/identify-orphaned-text.php
   ```

## Best Practices

1. **Always check context**: Determine if anchor text is natural or orphaned
2. **Remove awkward text**: Patterns like "mehr zum/zur" should always be removed
3. **Keep natural text**: Full sentences or natural phrases should remain
4. **Clean up punctuation**: Remove trailing colons and extra spaces after removal
5. **Validate after fixes**: Always verify content flows naturally

## Related Documentation

- [Internal Linking Guide](./INTERNAL_LINKING_GUIDE.md)
- [Context-Aware Linking Implementation](./CONTEXT_AWARE_LINKING_IMPLEMENTATION.md)
- [Word Boundary Guidelines](./WORD_BOUNDARY_GUIDELINES.md)
