# Anchor Text Formatting Standards

**Last Updated:** 2026-01-13

## Overview

All anchor text in blog posts must be properly formatted without leading or trailing whitespace. This ensures consistent display, better accessibility, and professional appearance.

## Standards

### Required Formatting

- **No leading spaces inside anchor**: Anchor text should not start with whitespace inside the tag
- **No trailing spaces inside anchor**: Anchor text should not end with whitespace inside the tag
- **Proper word separation**: Spaces that separate anchor text from surrounding words are moved outside the anchor tag
- **Preserve word spacing**: Spaces between words within anchor text are preserved
- **Move spaces outside**: Trailing/leading spaces are moved outside anchor tags to preserve word separation

### Examples

**❌ Incorrect (trailing space inside):**

```html
<a href="/test">Zeiterfassung </a>wird dieser Prozess
```

**❌ Incorrect (no space, stuck together):**

```html
<a href="/test">Zeiterfassung</a>wird dieser Prozess
```

**✅ Correct (space outside anchor tag):**

```html
<a href="/test">Zeiterfassung</a> wird dieser Prozess
```

### Examples

**❌ Incorrect:**

```html
<a href="/test"> Schichtplanung </a>
<a href="/test">Schichtplanung </a>
<a href="/test"> Schichtplanung</a>
```

**✅ Correct:**

```html
<a href="/test">Schichtplanung</a>
<a href="/test">Zeiterfassung und Schichtplanung</a>
```

## Automatic Normalization

The blog content processing system automatically normalizes anchor text:

1. **PostContent.php**: Normalizes anchor text during DOM processing
2. **sanitizeHtmlOutput()**: Normalizes anchor text during HTML sanitization

Both systems ensure that any anchor text with leading or trailing spaces is automatically trimmed.

### Normalization Logic

The normalization process handles multiple scenarios:

1. **Anchors stuck to following words**: `<a>text</a>word` → `<a>text</a> word` (adds space)
2. **Anchors with trailing spaces inside**: `<a>text </a>word` → `<a>text</a> word` (moves space outside)
3. **Anchors with leading spaces inside**: `word<a> text</a>` → `word <a>text</a>` (moves space outside)
4. **DOM-based normalization** (PostContent.php): Uses XPath to find all anchor tags and moves spaces outside
5. **Regex-based normalization** (sanitizeHtmlOutput): Uses regex to find and move spaces outside anchor tags

## Validation

### Manual Validation

Use the validation script to check all blog posts:

```bash
python3 v2/scripts/validate-blog-anchor-text.py
```

This script will:

- Scan all blog posts
- Report any anchor text with leading/trailing spaces
- Exit with error code if issues found

### Automated Validation

The normalization logic logs warnings when normalization occurs:

```
PostContent: Normalized trailing whitespace in anchor text: 'Schichtplanung '
```

Check PHP error logs for normalization events.

## Batch Fixing

If you need to fix existing posts, use the batch fix script:

```bash
python3 v2/scripts/fix-blog-anchor-spaces.py
```

**Important**: This script:

- Creates automatic backups before modifying files
- Reports summary of changes
- Validates JSON after modifications

## Best Practices

### When Creating New Content

1. **Avoid trailing spaces**: Don't add spaces after anchor text
2. **Avoid leading spaces**: Don't add spaces before anchor text
3. **Use proper formatting**: Keep anchor text concise and properly formatted

### When Migrating WordPress Content

1. **Run validation**: Check for anchor text issues before migration
2. **Use batch fix**: Run fix script if issues are found
3. **Verify output**: Check rendered HTML after migration

### When Editing Blog Posts

1. **Check anchor text**: Ensure no trailing/leading spaces
2. **Test rendering**: Verify anchor text displays correctly
3. **Use validation**: Run validation script before committing

## Technical Details

### Normalization Process

1. **Detection**: System detects anchor tags with leading/trailing whitespace
2. **Trimming**: Whitespace is removed from text node boundaries
3. **Preservation**: Internal spacing and nested HTML are preserved
4. **Logging**: Normalization events are logged for debugging

### Edge Cases Handled

- **Nested HTML**: Anchors with nested tags (e.g., `<a><strong>text</strong></a>`) are handled correctly
- **Multiple text nodes**: Anchors with multiple text nodes are normalized properly
- **Empty anchors**: Empty anchor tags are preserved as-is
- **Whitespace-only**: Anchors with only whitespace are handled gracefully

## Related Documentation

- [Blog Content Processing Guide](guides/blog-content-processing.md)
- [HTML Sanitization Standards](../../development/HTML_SANITIZATION.md)
- [Blog Migration Guide](guides/blog-migration.md)

## Support

For questions or issues:

- Check `v2/components/blog/PostContent.php` for DOM normalization logic
- Review `v2/config/blog-template-helpers.php` for regex normalization
- Run validation script to identify issues
- Check PHP error logs for normalization warnings
