# Blog FAQ Section Implementation

**Last Updated:** 2026-03-23

## Current model (2026)

**Primary source:** FAQs live in the blog post JSON file as the top-level **`faqs`** array (`question` + `answer` HTML). The template renders them via `BlogFAQ.php` and FAQPage schema targets this array.

**Legacy / fallback:** `extract_faqs_from_html()` and HTML embedded FAQ blocks exist for older posts and for tooling that still cleans `content.html`. **Do not introduce new FAQ content as embedded HTML**—use the pipeline and `add-faqs-to-post.php` per [FAQ_SOURCE_OF_TRUTH.md](FAQ_SOURCE_OF_TRUTH.md).

**Cross-site FAQ standards (all page types):** [FAQ_WEBSITE_STANDARD.md](../FAQ_WEBSITE_STANDARD.md).

---

## Overview

A dedicated FAQ section for blog posts that extracts FAQs from post content HTML and displays them in a specialized section matching the styling of other FAQ sections on the site (static_customers_new.php, tools_arbeitslosengeld_rechner.php).

## Features

- **Automatic FAQ Extraction**: Extracts FAQs from blog post HTML content (embedded as `schema-faq` divs)
- **Content Cleanup**: Removes FAQs from main content HTML to avoid duplication
- **Dynamic Display**: FAQ section appears after the "Das könnte dich auch interessieren" carousel and before the footer
- **Conditional Rendering**: FAQ section is hidden for posts without FAQs
- **SEO Optimization**: Generates FAQPage schema markup automatically for posts with FAQs
- **Topic Extraction**: Intelligently extracts topic from post title for FAQ heading using advanced pattern matching
- **Long Compound Words**: Single German compound words (like "Arbeitsunfähigkeitsbescheinigung") are displayed fully without truncation, allowing CSS to handle natural word-breaking

## Implementation Details

### FAQ Extraction

**Function:** `extract_faqs_from_html($html_content)` in `v2/config/blog-template-helpers.php`

- Parses HTML using DOMDocument
- Finds all `<div class="schema-faq-section">` elements
- Extracts question from `<strong class="schema-faq-question">`
- Extracts answer from `<p class="schema-faq-answer">` (preserves HTML formatting)
- Returns array of FAQ items: `[['question' => string, 'answer' => string], ...]`

### FAQ Removal

**Function:** `remove_faqs_from_html($html_content)` in `v2/config/blog-template-helpers.php`

- Removes `<div class="schema-faq">` wrapper and all FAQ sections
- Removes FAQ headings (`<h2>FAQ</h2>` or similar)
- Preserves all other content structure
- Returns cleaned HTML without FAQ sections

### Component

**File:** `v2/components/blog/BlogFAQ.php`

- Accepts `$faqs` array and `$post_title` string
- Renders FAQ section matching reference styling exactly
- Uses `<details>` elements with proper IDs (`faq-1`, `faq-2`, etc.)
- Includes proper accessibility attributes
- Handles empty FAQs array gracefully (returns early)

### Schema Generation

**File:** `v2/config/blog-schema-generator.php`

- Extends `generate_blog_schema()` function for 'post' case
- Accepts `$faqs` array parameter in `$overrides`
- Generates FAQPage schema with all FAQs
- Strips HTML tags from answers for schema (plain text only)
- Matches schema structure from reference pages

## Current Status

**Statistics:**

- **Total Posts:** 99
- **Posts with FAQs:** 68 (67 original + 1 new example)
- **Posts without FAQs:** 31
- **Average FAQs per post:** 8.7
- **Average Quality Score:** 78.2/100

**Tracking Documents:**

- `docs/content/blog/BLOG_FAQ_STATUS.md` - FAQ status tracking
- `docs/content/blog/FAQ_INVENTORY.md` - Comprehensive FAQ inventory
- `docs/content/blog/FAQ_QUALITY_AUDIT.md` - Quality audit report
- `docs/content/blog/FAQ_GAP_PRIORITY.md` - Priority list for posts without FAQs

## Files Modified/Created

### New Files

- `v2/components/blog/BlogFAQ.php` - FAQ component
- `scripts/blog/audit-blog-faqs.php` - FAQ audit script
- `scripts/blog/remove-faqs-from-content.php` - Batch FAQ removal script
- `docs/content/blog/BLOG_FAQ_STATUS.md` - FAQ tracking document
- `docs/content/blog/FAQ_IMPLEMENTATION.md` - This documentation

### Modified Files

- `v2/config/blog-template-helpers.php` - Added FAQ extraction/removal functions
- `v2/components/blog/PostContent.php` - Removes FAQs from content HTML
- `v2/pages/blog/post.php` - Extracts FAQs and includes FAQ section
- `v2/config/blog-schema-generator.php` - Added FAQPage schema generation

## Usage

### In Blog Post Template

```php
// Extract FAQs from post content HTML
$post_faqs = [];
$html_content = $post['content']['html'] ?? '';
if (!empty($html_content)) {
    $post_faqs = extract_faqs_from_html($html_content);
}

// ... render post content ...

// Include FAQ section after related carousel
<?php if (!empty($post_faqs)): ?>
    <?php
    $faqs = $post_faqs;
    $post_title = $post['title'] ?? '';
    include __DIR__ . '/../../components/blog/BlogFAQ.php';
    ?>
<?php endif; ?>
```

### Schema Generation

```php
// Generate schema (include FAQs if available)
$schema_overrides = [];
if (!empty($post_faqs)) {
    $schema_overrides['faqs'] = $post_faqs;
}
$schema = render_blog_schema('post', $post, $schema_overrides);
```

## Testing

### Test FAQ Extraction

```bash
php -r "require_once 'v2/config/blog-template-helpers.php'; \$data = json_decode(file_get_contents('v2/data/blog/posts/ratgeber/urlaubsanspruch-von-minijobbern.json'), true); \$faqs = extract_faqs_from_html(\$data['content']['html']); echo json_encode(\$faqs, JSON_PRETTY_PRINT | JSON_UNESCAPED_UNICODE);"
```

### Audit All Blog FAQs

```bash
php scripts/blog/audit-blog-faqs.php
```

### Test FAQ Removal from Content

```bash
php -r "require_once 'v2/config/blog-template-helpers.php'; \$data = json_decode(file_get_contents('v2/data/blog/posts/ratgeber/urlaubsanspruch-von-minijobbern.json'), true); \$cleaned = remove_faqs_from_html(\$data['content']['html']); echo 'Original length: ' . strlen(\$data['content']['html']) . \"\n\"; echo 'Cleaned length: ' . strlen(\$cleaned) . \"\n\";"
```

## Validation Checklist

- [x] FAQ section styling matches reference pages exactly
- [x] FAQs extracted correctly from all 67 posts with FAQs
- [x] FAQs removed from content HTML (no duplicates)
- [x] FAQ section hidden for posts without FAQs
- [x] FAQPage schema generated correctly
- [x] Schema validates with Google Rich Results Test (structure verified)
- [x] All FAQ answers are plain text in schema (HTML stripped)
- [x] FAQ tracking document created and accurate
- [x] Responsive design works on all screen sizes
- [x] Accessibility requirements met (keyboard nav, screen readers)
- [x] No console errors or broken functionality

## FAQ Optimization Workflow

### For New Posts Without FAQs

1. **Collect Research Data:**

   ```bash
   php v2/scripts/blog/collect-faq-research-data.php --post=slug --category=category
   ```

2. **Generate FAQ Questions:**

   ```bash
   php v2/scripts/blog/generate-faq-questions.php --post=slug --category=category
   ```

3. **Write FAQ Answers** (40-80 words each, following best practices)

4. **Implement FAQs** in post JSON file as HTML schema-faq divs

5. **Validate:**
   ```bash
   php v2/scripts/blog/validate-faq-schema.php --post=slug --category=category
   ```

### For Existing Posts with FAQs

1. **Audit Quality:**

   ```bash
   php v2/scripts/blog/audit-faq-quality.php
   ```

2. **Review Quality Audit Report:** `docs/content/blog/FAQ_QUALITY_AUDIT.md`

3. **Improve FAQs** based on audit findings

4. **Validate Schema:**
   ```bash
   php v2/scripts/blog/validate-faq-schema.php --all
   ```

## Related Scripts

- `v2/scripts/blog/audit-faq-inventory.php` - Create comprehensive FAQ inventory
- `v2/scripts/blog/prioritize-faq-gaps.php` - Prioritize posts without FAQs
- `v2/scripts/blog/collect-faq-research-data.php` - Collect PAA questions, GSC queries, keywords
- `v2/scripts/blog/generate-faq-questions.php` - Generate FAQ questions from research data
- `v2/scripts/blog/audit-faq-quality.php` - Audit FAQ quality (count, length, keywords, redundancy)
- `v2/scripts/blog/validate-faq-schema.php` - Validate FAQPage schema generation

## Related Documentation

- `docs/content/blog/FAQ_BEST_PRACTICES.md` - Best practices guide
- `docs/content/blog/FAQ_OPTIMIZATION_GUIDE.md` - Optimization guide
- `docs/content/blog/FAQ_WORKFLOW.md` - Complete workflow documentation
- `.cursor/rules/blog-faq-optimization.mdc` - Cursor rules for FAQ optimization

## Future Enhancements

### Content Creation

- Add FAQs to the 31 posts currently without FAQs (priority list available)
- Review and improve existing FAQs based on quality audit findings
- Add FAQs to new blog posts as they're created

### Technical Improvements

- Consider caching FAQ extraction results
- Add FAQ analytics tracking
- Implement FAQ search functionality
- Add FAQ upvote/downvote for relevance

## Troubleshooting

### HTTP 500 Errors with FAQ Posts

**Symptoms:**
- Blog posts with FAQs return HTTP 500 errors in production
- Blog posts without FAQs load correctly
- Posts load successfully in local environment

**Root Causes:**
1. Function redefinition errors (e.g., `extract_faq_topic` defined multiple times)
2. Missing error handling in FAQ processing
3. Encoding issues with FAQ answer HTML
4. Malformed FAQ data structure

**Solutions Implemented:**

1. **Error Handling in Schema Generation** (`v2/config/blog-schema-generator.php`):
   - Added try-catch blocks around FAQ schema generation
   - Individual FAQ processing wrapped in error handling
   - Continues processing even if individual FAQs fail
   - Logs errors without breaking page rendering

2. **Error Handling in FAQ Component** (`v2/components/blog/BlogFAQ.php`):
   - Added function_exists check for `extract_faq_topic`
   - Added try-catch around FAQ answer sanitization
   - Fallback rendering if sanitization fails
   - Validates `sanitizeHtmlOutput` function availability

3. **Validation in Post Template** (`v2/pages/blog/post.php`):
   - Validates FAQ structure before processing
   - Filters out invalid FAQ items
   - Validates FAQs before schema generation
   - Comprehensive error handling around FAQ section rendering

**Testing Scripts:**

```bash
# Test FAQ rendering for specific posts
php v2/scripts/blog/test-faq-rendering.php

# Diagnose all posts with FAQs
php v2/scripts/blog/diagnose-all-faqs.php

# Validate all FAQ posts (pre-deployment)
php v2/scripts/blog/validate-faq-posts.php

# Test edge cases
php v2/scripts/blog/test-faq-edge-cases.php
```

**Pre-Deployment Checklist:**
1. Run `php v2/scripts/blog/diagnose-all-faqs.php` - should show 0 errors
2. Run `php v2/scripts/blog/validate-faq-posts.php` - should exit with code 0
3. Test failing posts locally with production-like error reporting
4. Verify error logs don't show FAQ-related errors

**Common Issues:**

1. **Function Already Defined Error:**
   - Ensure `extract_faq_topic` has `function_exists` check
   - Check for multiple includes of `BlogFAQ.php`

2. **Sanitization Errors:**
   - Verify `sanitizeHtmlOutput` function is available
   - Check FAQ answer HTML for malformed content
   - Ensure UTF-8 encoding is correct

3. **Schema Generation Errors:**
   - Validate FAQ structure before schema generation
   - Check for empty questions/answers
   - Verify JSON encoding works correctly

## Related Documentation

- `docs/content/blog/COMPONENT_API.md` - Component API documentation
- `docs/content/blog/BLOG_FAQ_STATUS.md` - FAQ status tracking
- `docs/content/blog/RELATED_POSTS_LOGIC.md` - Related posts algorithm
- `docs/content/blog/RESOURCE_MATCHING_GUIDE.md` - Resource matching logic
- `docs/content/blog/TROUBLESHOOTING_GUIDE.md` - General troubleshooting guide
