# Link Validation Guide

**Last Updated:** 2026-03-16

Comprehensive guide for validating internal links in blog posts to prevent broken links and ensure link quality.

## Overview

This guide covers:

1. **Pre-save validation** - Validate links before saving blog post content
2. **Post-save validation** - Audit existing posts for broken links
3. **Prevention measures** - Best practices to avoid broken links
4. **Monitoring** - Regular audits and alerts for broken links

## Quick Start

### Before Saving Blog Post Content

```bash
# Validate HTML file before updating post
php v2/scripts/blog/validate-internal-links-exist.php --html=path/to/content.html --strict

# If broken links found, fix them before saving
```

### After Updating Blog Post

```bash
# Validate updated post
php v2/scripts/blog/validate-internal-links-exist.php --post=slug --category=lexikon

# Run comprehensive audit
php v2/scripts/blog/audit-all-internal-links.php --output=docs/content/blog/INTERNAL_LINKS_AUDIT_REPORT.md
```

## Validation Scripts

### validate-internal-links-exist.php

**Purpose:** Validate that all internal links in blog post HTML content point to existing pages.

**Usage:**

```bash
# Validate HTML file
php v2/scripts/blog/validate-internal-links-exist.php --html=path/to/content.html

# Validate existing post
php v2/scripts/blog/validate-internal-links-exist.php --post=slug --category=lexikon

# Read from stdin
echo "<a href='/insights/lexikon/test/'>Test</a>" | php v2/scripts/blog/validate-internal-links-exist.php --stdin

# Strict mode (exit with error code if broken links found)
php v2/scripts/blog/validate-internal-links-exist.php --html=path/to/content.html --strict
```

**Exit Codes:**

- `0` = All links valid
- `1` = Broken links found (only if `--strict`)
- `2` = Error parsing input

**What It Validates:**

- Blog posts (`/insights/{category}/{slug}/`)
- Product pages (`/schichtplan`, `/arbeitszeiterfassung`, `/payroll`)
- Tool pages (`/tools/{slug}`)
- Template pages (`/vorlagen/{slug}`)
- Comparison pages (`/compare/{slug}`, `/alternativen/{slug}`)
- Pillar pages (`/insights/dienstplan`, `/insights/zeiterfassung`)

**Features:**

- Checks umlaut redirects (e.g., `kapazitätsplanung` → `kapazitaetsplanung`)
- Detects incorrect `/templates/` URLs (suggests `/vorlagen/`)
- Validates against actual file structure and `.htaccess` routes

### audit-all-internal-links.php

**Purpose:** Comprehensive audit of all internal links across all blog posts.

**Usage:**

```bash
php v2/scripts/blog/audit-all-internal-links.php --output=docs/content/blog/INTERNAL_LINKS_AUDIT_REPORT.md
```

**Output:**

- Total posts audited
- Total links found
- Broken links list (with reasons)
- Incorrect URLs (with suggestions)
- Link type distribution

## Prevention Measures

### 1. Always Validate Before Saving

**CRITICAL:** Never save blog post content without validating internal links first.

**Workflow:**

```bash
# 1. Edit content HTML
# 2. Validate before saving
php v2/scripts/blog/validate-internal-links-exist.php --html=path/to/content.html --strict

# 3. If validation passes, save with update-post-content.php
php v2/scripts/blog/update-post-content.php --post=slug --category=lexikon --html=path/to/content.html
```

### 2. Use Canonical URLs Only

**Always use canonical URLs in internal links:**

- ✅ `/insights/lexikon/onboarding/` (canonical)
- ❌ `/onboarding` (redirects to canonical)

**Check `.htaccess` for redirects:**

```bash
grep -i "RewriteRule.*onboarding" .htaccess
```

### 3. Verify Post Exists Before Linking

**Before adding a link to a blog post:**

```bash
# Check if post exists
ls v2/data/blog/posts/lexikon/arbeitszeit.json

# Or use validation script
php v2/scripts/blog/validate-internal-links-exist.php --html=path/to/content.html
```

### 4. Handle Umlaut URLs Correctly

**Blog posts use ASCII slugs (canonical):**

- ✅ `/insights/lexikon/kapazitaetsplanung/` (ASCII canonical)
- ❌ `/insights/lexikon/kapazitätsplanung/` (umlaut - redirects)

**Check umlaut redirects:**

```bash
cat v2/config/blog-umlaut-redirects.php
```

### 5. Use Correct URL Patterns

**Template pages:**

- ✅ `/vorlagen/{slug}`
- ❌ `/templates/{slug}` (does not exist)

**Tool pages:**

- ✅ `/tools/{slug}`
- ✅ Check `.htaccess` for custom routes

**Comparison pages:**

- ✅ `/compare/{slug}`
- ✅ `/alternativen/{slug}` (also valid)

## Integration with update-post-content.php

**Recommended:** Integrate link validation into `update-post-content.php` workflow.

**Current workflow:**

```bash
php v2/scripts/blog/update-post-content.php --post=slug --category=lexikon --html=path/to/content.html
```

**Recommended workflow (manual):**

```bash
# 1. Validate links first
php v2/scripts/blog/validate-internal-links-exist.php --html=path/to/content.html --strict

# 2. If validation passes, update content
php v2/scripts/blog/update-post-content.php --post=slug --category=lexikon --html=path/to/content.html
```

**Future enhancement:** Add `--validate-links` flag to `update-post-content.php` to automatically validate before saving.

## Monitoring

### Regular Audits

**Run comprehensive audit monthly:**

```bash
php v2/scripts/blog/audit-all-internal-links.php --output=docs/content/blog/INTERNAL_LINKS_AUDIT_REPORT_$(date +%Y%m%d).md
```

**Review audit report:**

- Check broken links section
- Fix broken links immediately
- Update documentation if patterns emerge

### Link Health Monitoring

**Create monitoring script** (see `create-monitoring` todo):

- Run daily/weekly audits
- Send alerts for new broken links
- Track broken link trends over time

## Common Issues and Solutions

### Issue: Blog Post Not Found

**Error:** `Blog post not found: lexikon/schichtplan`

**Solution:**

1. Check if post exists: `ls v2/data/blog/posts/lexikon/schichtplan.json`
2. If post doesn't exist, find replacement:
   - Check similar posts: `ls v2/data/blog/posts/lexikon/ | grep schicht`
   - Use product page if appropriate: `/schichtplan`
   - Use ratgeber post if available: `/insights/ratgeber/{slug}/`

### Issue: Incorrect URL Pattern

**Error:** `Uses /templates/ instead of /vorlagen/`

**Solution:**

- Replace `/templates/{slug}` with `/vorlagen/{slug}`
- Verify template exists: `ls v2/pages/template_{slug}.php`

### Issue: Umlaut URL

**Error:** `Uses umlaut URL, should use ASCII canonical`

**Solution:**

- Replace umlaut URL with ASCII canonical
- Example: `/insights/lexikon/kapazitätsplanung/` → `/insights/lexikon/kapazitaetsplanung/`

### Issue: Product Page Link

**Error:** `Tool page not found: /tools/schichtplan`

**Solution:**

- Check if it's a product page: `/schichtplan`
- Or check if it's a blog post: `/insights/lexikon/schichtplan/`

## Validation Checklist

**Before saving blog post content:**

- [ ] Run `validate-internal-links-exist.php --strict` on HTML content
- [ ] Fix all broken links before saving
- [ ] Verify canonical URLs are used (check `.htaccess` for redirects)
- [ ] Check umlaut URLs use ASCII canonical slugs
- [ ] Verify template URLs use `/vorlagen/` not `/templates/`
- [ ] Test links in browser after saving

**After updating blog post:**

- [ ] Run `validate-internal-links-exist.php` on updated post
- [ ] Run `audit-all-internal-links.php` to check for regressions
- [ ] Browser test links to verify they work

**Monthly maintenance:**

- [ ] Run comprehensive audit (`audit-all-internal-links.php`)
- [ ] Review broken links report
- [ ] Fix all broken links
- [ ] Update documentation if patterns emerge

## Related Documentation

- [INTERNAL_LINKING_PROCESS.md](INTERNAL_LINKING_PROCESS.md) - Internal linking process overview
- [BLOG_CONTENT_EDIT_WORKFLOW.md](BLOG_CONTENT_EDIT_WORKFLOW.md) - Blog content editing workflow
- [BROKEN_LINKS_ANALYSIS_20260316.md](BROKEN_LINKS_ANALYSIS_20260316.md) - Analysis of broken links from CSV audit
- [INTERNAL_LINKS_AUDIT_REPORT_20260316.md](INTERNAL_LINKS_AUDIT_REPORT_20260316.md) - Latest comprehensive audit report

## Script Reference

### validate-internal-links-exist.php

**Location:** `v2/scripts/blog/validate-internal-links-exist.php`

**Options:**

- `--html=path` - Path to HTML file to validate
- `--post=slug` - Post slug (loads from JSON and validates content.html)
- `--category=cat` - Post category (required with --post)
- `--stdin` - Read HTML from stdin
- `--strict` - Exit with error code if broken links found

**Examples:**

```bash
# Validate HTML file
php v2/scripts/blog/validate-internal-links-exist.php --html=content-draft.html --strict

# Validate existing post
php v2/scripts/blog/validate-internal-links-exist.php --post=arbeitszeit --category=lexikon

# Validate from stdin
cat content.html | php v2/scripts/blog/validate-internal-links-exist.php --stdin
```

### audit-all-internal-links.php

**Location:** `v2/scripts/blog/audit-all-internal-links.php`

**Options:**

- `--output=path` - Output path for audit report (default: `docs/content/blog/INTERNAL_LINKS_AUDIT_REPORT.md`)

**Examples:**

```bash
# Generate audit report
php v2/scripts/blog/audit-all-internal-links.php --output=docs/content/blog/INTERNAL_LINKS_AUDIT_REPORT_20260316.md
```
