# Blog System Simplification Guide

**Last Updated:** 2026-01-14

## Overview

The blog system has been simplified to reduce complexity and make manual editing easier. Content is now cleaned once at extraction time and stored clean in JSON files, eliminating the need for extensive processing at render time.

## Key Changes

### 1. Content Structure

**Before:**

- Content HTML contained WordPress artifacts (wrappers, CTAs, authors, featured images)
- FAQs were embedded in content HTML and extracted at render time
- PostContent.php performed extensive cleanup (1200+ lines)

**After:**

- Content HTML is clean (no WordPress artifacts)
- FAQs are stored separately in `faqs` array in JSON
- PostContent.php is simplified (34 lines) - zero processing, just outputs pre-processed HTML
- All processing (image wrapping, table wrapping, sanitization) happens at extraction time

### 2. PostContent.php Simplification

**Removed:**

- WordPress wrapper removal
- CTA removal
- Author removal
- Featured image duplicate removal
- FAQ extraction/removal
- WordPress URL conversion
- Extensive anchor text normalization

**Moved to Extraction Time:**

- Security sanitization (XSS prevention) - now in Python scripts
- Image lightbox wrapping (presentation) - now in Python scripts
- Table breakout wrapping (presentation) - now in Python scripts
- Preserve embeds (iframes, scripts, videos) - handled during extraction

**PostContent.php Now:**

- Simply outputs pre-processed HTML
- Zero processing overhead
- 0.01-0.03ms render time (99% reduction from ~50-85ms)

### 3. Script Consolidation

**Consolidated Scripts:**

- **Content Management** → `manage-blog-content.php`

  - Content extraction
  - Content cleaning
  - Content validation

- **Link Management** → `manage-blog-links.php`

  - Link recommendations
  - Link addition
  - Link validation
  - Link fixing

- **FAQ Management** → `manage-blog-faqs.php`

  - FAQ generation
  - FAQ enhancement
  - FAQ validation
  - FAQ deduplication

- **Validation** → `validate-blog-content.php`
  - Content validation
  - Link validation
  - FAQ validation
  - Schema validation
  - SEO validation

## JSON Structure

### Standard Post JSON

```json
{
  "slug": "post-slug",
  "title": "Post Title",
  "category": "lexikon",
  "url": "/insights/lexikon/post-slug/",
  "publication_date": "2023-09-01T10:40:44+00:00",
  "modified_date": "2026-01-14T10:49:33+00:00",
  "author": {"name": "Emma"},
  "featured_image": {...},
  "excerpt": "...",
  "content": {
    "html": "<p>Clean HTML content...</p>",
    "text": "Plain text...",
    "word_count": 1214
  },
  "faqs": [
    {"question": "...", "answer": "..."}
  ],
  "images": [...],
  "internal_links": [...],
  "meta": {...},
  "topics": [...],
  "clusters": {...},
  "related_posts": [...]
}
```

### Content HTML Standards

- **Clean HTML**: No WordPress wrapper divs, no CTAs, no authors
- **Standard tags**: p, h2-h6, ul, ol, li, a, img, blockquote, pre, code, table, iframe, script, video
- **Images**: Already converted to local paths (`/insights/bilder/`)
- **Links**: Already processed (internal links in `internal_links` array)
- **Embeds**: Preserved as-is (iframes, scripts, videos)

## Manual Editing Workflow

### Editing Post Content

1. **Open JSON file**: `v2/data/blog/posts/{category}/{slug}.json`
2. **Edit content HTML**: Modify `content.html` directly
3. **Edit FAQs**: Modify `faqs` array directly
4. **Save**: Changes are immediately reflected (no processing needed)

### Adding FAQs

```json
{
  "faqs": [
    {
      "question": "Your question?",
      "answer": "<p>Your answer with <strong>HTML</strong> support.</p>"
    }
  ]
}
```

### Adding Images

1. Place image in `/v2/img/` or `/insights/bilder/`
2. Reference in content HTML: `<img src="/insights/bilder/image.webp" alt="Description">`
3. Add to `images` array if needed for metadata

### Adding Embeds

Embed directly in content HTML:

```html
<iframe src="https://..." width="560" height="315"></iframe>
```

Or use script tags:

```html
<script src="https://..."></script>
```

## Script Usage

### Content Management

```bash
# Clean all posts (extract FAQs, remove WordPress artifacts)
php v2/scripts/blog/manage-blog-content.php --action=clean --all

# Clean specific post
php v2/scripts/blog/manage-blog-content.php --action=clean --category=lexikon --post=slug

# Validate content structure
php v2/scripts/blog/manage-blog-content.php --action=validate

# Extract from WordPress (runs Python script)
php v2/scripts/blog/manage-blog-content.php --action=extract
```

### Link Management

```bash
# Generate link recommendations
php v2/scripts/blog/manage-blog-links.php --action=recommendations

# Add links to posts
php v2/scripts/blog/manage-blog-links.php --action=add --category=lexikon

# Validate links
php v2/scripts/blog/manage-blog-links.php --action=validate

# Fix problematic links
php v2/scripts/blog/manage-blog-links.php --action=fix
```

### FAQ Management

```bash
# Generate FAQ questions
php v2/scripts/blog/manage-blog-faqs.php --action=generate-questions

# Generate FAQ answers
php v2/scripts/blog/manage-blog-faqs.php --action=generate-answers

# Enhance FAQs
php v2/scripts/blog/manage-blog-faqs.php --action=enhance

# Validate FAQs
php v2/scripts/blog/manage-blog-faqs.php --action=validate

# Remove duplicates
php v2/scripts/blog/manage-blog-faqs.php --action=deduplicate
```

### Validation

```bash
# Validate all content
php v2/scripts/blog/validate-blog-content.php --type=all

# Validate specific type
php v2/scripts/blog/validate-blog-content.php --type=content
php v2/scripts/blog/validate-blog-content.php --type=links
php v2/scripts/blog/validate-blog-content.php --type=faqs
php v2/scripts/blog/validate-blog-content.php --type=schema
php v2/scripts/blog/validate-blog-content.php --type=seo
```

## Python Scripts

### Clean Existing Posts

```bash
# Clean all posts
python3 scripts/blog/clean-existing-posts.py --all

# Clean specific category
python3 scripts/blog/clean-existing-posts.py --category=lexikon

# Clean specific post
python3 scripts/blog/clean-existing-posts.py --category=lexikon --post=slug

# Dry run (test without changes)
python3 scripts/blog/clean-existing-posts.py --all --dry-run
```

## Benefits

1. **Easier Manual Editing**: Edit JSON directly, see changes immediately
2. **Faster Rendering**: Less processing at render time
3. **Clearer Structure**: FAQs separate from content
4. **Better Maintainability**: Fewer scripts, clearer workflow
5. **Consistent Content**: Clean content stored in JSON

## Migration Notes

- All 99 posts have been cleaned and updated
- FAQs extracted separately
- WordPress artifacts removed
- URLs converted to local paths
- Backup created: `docs/backups/blog-snapshots/2026-01-14-143644`

## Troubleshooting

### Content Still Has WordPress Artifacts

Run cleanup script:

```bash
python3 scripts/blog/clean-existing-posts.py --category={category} --post={slug}
```

### FAQs Not Displaying

Check that FAQs are in the `faqs` array in JSON, not embedded in content HTML.

### Images Not Loading

Verify image paths are `/insights/bilder/` format, not WordPress URLs.

## Related Documentation

- `.cursor/rules/blog-templates.mdc` - Template patterns and best practices
- `docs/content/blog/guides/TEMPLATE_DEVELOPMENT_GUIDE.md` - Development guide
- `v2/scripts/blog/archive/consolidated-2026-01-14/README.md` - Script consolidation details
