# German Word Boundary Guidelines for Internal Linking

**Last Updated:** 2026-01-10

## Overview

German language has compound words and plural forms that can cause issues with internal linking if word boundaries are not properly respected. This document explains how word boundary matching works and how to avoid partial-word links.

## The Problem

### Examples of Partial-Word Links (Incorrect)

1. **Plural Forms:**

   - ❌ Linking "Checkliste" inside "Checklisten" (plural)
   - ❌ Linking "Minijob" inside "Minijobs" (plural)

2. **Compound Words:**

   - ❌ Linking "Schichtplanungs" inside "Schichtplanungsfunktionen"
   - ❌ Linking "Arbeitszeit" inside "Arbeitszeitmodell"
   - ❌ Linking "Zeiterfassung" inside "Zeiterfassungssysteme"

3. **German Characters:**
   - ❌ Linking "Pflege" inside "Pflegebedürftiger" (compound with umlaut)

### Why This Matters

- **SEO Impact:** Search engines may interpret partial-word links as poor quality
- **User Experience:** Links that only cover part of a word look unprofessional
- **Accessibility:** Screen readers may read partial words incorrectly
- **Content Quality:** Breaks the natural flow of reading

## Solution: German-Aware Word Boundaries

### Technical Implementation

All linking scripts use German-aware word boundary patterns:

**Python (`link_utils.py`):**

```python
pattern = r'(?<![a-zA-ZäöüÄÖÜß])keyword(?![a-zA-ZäöüÄÖÜß])'
```

**PHP (`link_utils.php`):**

```php
pattern = '/(?<![\p{L}])keyword(?![\p{L}])/ui'
```

### How It Works

1. **Negative Lookbehind:** `(?<![a-zA-ZäöüÄÖÜß])` - Ensures keyword is NOT preceded by a letter
2. **Keyword Match:** The actual keyword to match
3. **Negative Lookahead:** `(?![a-zA-ZäöüÄÖÜß])` - Ensures keyword is NOT followed by a letter

This ensures keywords are only matched as complete words, not as substrings.

## Best Practices

### ✅ Correct Linking

1. **Complete Words:**

   - ✅ "Checkliste" as standalone word
   - ✅ "Schichtplanung" as standalone word
   - ✅ "Zeiterfassung" as standalone word

2. **Context-Aware:**

   - ✅ "Die **Checkliste** hilft..." (complete word)
   - ✅ "**Schichtplanung** ist wichtig..." (complete word)

3. **Proper Boundaries:**
   - ✅ Link appears with spaces or punctuation before/after
   - ✅ Link is not part of a larger compound word

### ❌ Incorrect Linking

1. **Partial Words:**

   - ❌ "Checklisten" with only "Checkliste" linked
   - ❌ "Schichtplanungsfunktionen" with only "Schichtplanungs" linked

2. **Compound Words:**
   - ❌ "Arbeitszeitmodell" with only "Arbeitszeit" linked
   - ❌ "Zeiterfassungssysteme" with only "Zeiterfassung" linked

## Automated Tools

### Audit Script

**Script:** `v2/scripts/blog/audit-partial-word-links.py`

**Usage:**

```bash
python3 v2/scripts/blog/audit-partial-word-links.py
```

**Output:**

- Identifies all partial-word links in blog posts
- Generates report with context and issue details
- Saves to `docs/data/blog-partial-word-links-audit.json`

### Fix Script

**Script:** `v2/scripts/blog/fix-partial-word-links.py`

**Usage:**

```bash
python3 v2/scripts/blog/fix-partial-word-links.py
```

**Process:**

1. Reads audit results
2. Removes problematic links
3. Finds full word context
4. Re-adds links with proper word boundaries
5. Updates blog post JSON files

## Manual Review Process

### When Adding Links Manually

1. **Check Word Boundaries:**

   - Ensure keyword is a complete word
   - Check if it's part of a compound word
   - Verify plural forms are handled correctly

2. **Test in Context:**

   - Read the sentence with the link
   - Ensure it reads naturally
   - Verify no partial-word issues

3. **Use Tools:**
   - Run audit script after adding links
   - Review audit report for issues
   - Fix any problems immediately

## Common Patterns to Watch For

### Compound Words

German frequently uses compound words. Be careful with:

- **-funktionen** (functions): "Schichtplanungsfunktionen", "Zeiterfassungsfunktionen"
- **-systeme** (systems): "Zeiterfassungssysteme", "Planungssysteme"
- **-tools** (tools): "Schichtplanungstools", "Zeiterfassungstools"
- **-software** (software): "Schichtplanungssoftware", "Zeiterfassungssoftware"
- **-pflicht** (obligation): "Arbeitszeiterfassungspflicht"

### Plural Forms

Watch for plural endings:

- **-en**: "Checklisten", "Schichten", "Funktionen"
- **-e**: "Minijobs", "Tools", "Systeme"
- **-er**: "Mitarbeiter", "Planer"

### Solution

When encountering compound words or plurals:

1. Link the complete word if appropriate
2. Or skip linking if the keyword is only part of a larger word
3. Use German-aware word boundary matching in scripts

## Maintenance

### Regular Audits

- **Monthly:** Run audit script to check for new issues
- **After Content Updates:** Check for partial-word links
- **Before Publishing:** Verify all links respect word boundaries

### Fix Process

1. Run audit script
2. Review problematic links
3. Run fix script (or fix manually)
4. Verify fixes with re-audit
5. Document any edge cases

## Related Documentation

- [Internal Linking Guide](./INTERNAL_LINKING_GUIDE.md)
- [Link Quality Standards](guides/INTERNAL_LINKING_GUIDE.md#link-quality-standards)
- [Maintenance Tools Guide](./MAINTENANCE_TOOLS_GUIDE.md)
