# FAQ Creation Workflow 2026

**Last Updated:** 2026-02-23

Complete workflow for creating FAQs from scratch for blog posts, following SEO/AEO/GEO best practices and the fresh start approach.

## Overview

This workflow guides the systematic creation of FAQs for blog posts from scratch, ensuring high-quality, optimized FAQs that follow best practices for SEO, Answer Engine Optimization (AEO), and Generative Engine Optimization (GEO).

**Key Principles:**

- FAQs stored in `faqs` array (NOT in HTML content)
- Human-first content, optimized for search engines
- 10-15 FAQs per post
- 40-80 words per answer
- Natural keyword integration
- Du tone (informal German)
- Manual review required before publication

## Workflow Steps

### Step 1: Research Phase

**CRITICAL: Data Freshness Check**

Before collecting data, check if existing data is fresh (within last 7 days):

```bash
# Check data freshness for specific post
php v2/scripts/blog/check-data-freshness.php --post=slug --category=category --max-age=7

# Check all Tier 1 posts
php v2/scripts/blog/check-data-freshness.php --tier=1 --max-age=7

# Auto-refresh stale data
php v2/scripts/blog/check-data-freshness.php --tier=1 --max-age=7 --auto-refresh
```

**Collect SISTRIX Data:**

```bash
# Collect keywords, search intent, and SERP features
php v2/scripts/blog/collect-sistrix-data-batch.php --skip-existing

# Collect SERP features (includes PAA questions)
php v2/scripts/blog/collect-post-serp-features.php --limit=50
```

**Collect GSC Data:**

```bash
# Collect top queries and performance metrics
php v2/scripts/blog/collect-post-performance-gsc.php --all
```

**CRITICAL: PAA before FAQ Research**

`collect-faq-research-data.php` uses PAA from `paa-questions.json` when serp-features has no question text. **Always run `collect-post-paa-questions.php` before FAQ research** for best FAQ quality:

```bash
# Single post
php v2/scripts/blog/collect-post-paa-questions.php --post=slug --category=category

# All posts (or use run-sistrix-collection-batch.php for batch)
php v2/scripts/blog/collect-post-paa-questions.php --all
```

**Generate FAQ Research Data:**

```bash
# Combine PAA questions, GSC queries, keywords, LSI keywords
# Requires paa-questions.json (from collect-post-paa-questions.php) for full PAA text
php v2/scripts/blog/collect-faq-research-data.php --all --skip-sistrix
```

**Output:** `docs/content/blog/posts/{category}/{slug}/data/faq-research.json`

**What it includes:**

- **PAA questions** from `paa-questions.json` (keyword.questions) – run `collect-post-paa-questions.php` before FAQ research. Serp-features returns count only, not question text.
- **Top GSC queries** with clicks/impressions data (from `performance-gsc.json`)
- **Related keywords** from SISTRIX with volume/competition data
- **LSI keywords** extracted from content
- **SERP features** data

**Shared Keyword Database:**

The system now uses a shared keyword database at `docs/content/blog/seo-reports/domain-keywords.json`:

```bash
# Update shared keyword database
php v2/scripts/blog/aggregate-domain-keywords.php --update-groups
```

This enables keyword sharing across posts and reduces redundant API calls.

### Step 2: Question Generation

**Generate FAQ Questions:**

```bash
php v2/scripts/blog/generate-faq-questions.php --post=slug --category=category
```

**What it does:**

- Combines PAA questions, GSC queries, keywords, and LSI keywords
- Prioritizes questions by search volume and user intent
- Considers search intent from SISTRIX data
- Generates 10-15 questions

**Output:** `docs/content/blog/posts/{category}/{slug}/data/faq-questions.json`

**Supplemental FAQ Sources (When FAQ Count < 10):**

When PAA all map to H2s or `generate-faq-questions.php` produces fewer than 10 questions, use supplemental sources. See [FAQ_EXPANSION_GUIDE.md](FAQ_EXPANSION_GUIDE.md) for step-by-step instructions.

```bash
# Collect supplemental questions (competitor FAQs, LSI-based)
php v2/scripts/blog/collect-supplemental-faq-questions.php --post=slug --category=lexikon
```

`generate-faq-questions.php` automatically loads `faq-questions-supplemental.json` when output < 10.

**Question Sources (prioritized with data-driven scoring):**

1. **PAA questions** (highest priority, priority score: 1)

   - Loaded from SISTRIX SERP features
   - Direct questions from search results

2. **GSC top queries** (high priority, priority score: 2)

   - Prioritized by clicks and impressions
   - Priority score = (clicks × 10) + (impressions × 0.1)
   - Converted to question format if needed
   - Top 15 queries selected

3. **Keywords** (medium priority, priority score: 3)

   - Filtered by volume (≥ 50 searches/month) or competition (> 0)
   - Prioritized by volume, then by low competition
   - Priority score = (volume × 10) - (competition × 5)
   - Top 10 keywords selected

4. **Standard questions** (lower priority, priority score: 4)
   - Generated based on topic and search intent

**Selection Algorithm:**

- Ensures at least 5 PAA questions (if available)
- Ensures at least 3 GSC queries (if available)
- Fills remaining slots with keywords/standard
- Total: 15 questions selected

### Step 3: Answer Generation

**Generate FAQ Answers:**

**Option A: AI-Powered (Gemini primary, OpenAI fallback)**

```bash
php v2/scripts/blog/generate-faq-answers-optimized.php --post=slug --category=category --use-ai
```

**AI Configuration:**

- **Primary:** Gemini 2.5 Flash (via `GEMINI_API_KEY`; model defaults in [GEMINI_OPTIMIZATION_GUIDE.md](../../development/GEMINI_OPTIMIZATION_GUIDE.md) and `v2/config/gemini-models.php`)
- **Fallback:** OpenAI GPT-4 (via `OPENAI_API_KEY`) when Gemini fails or is unavailable
- **Max Tokens:** 300
- **Temperature:** 0.7

**Why Gemini primary:**

- More reliable when OpenAI has quota/rate-limit issues
- Good quality for FAQ answers
- OpenAI fallback available when needed

**Option B: Template-Based (Placeholders Only)**

```bash
php v2/scripts/blog/generate-faq-answers-optimized.php --post=slug --category=category [--allow-short]
```

**Template limitation:** Template mode produces placeholders (~20–40 words). The script exits with code 1 if any answer < 40 words (unless `--allow-short`). For production FAQs, use Option A (`--use-ai`) or create `faq-answers-optimized.json` manually with 40–80 word answers.

**Option C: Regenerate Short Answers Only**

For existing posts with some answers below 40 words, regenerate only those short answers:

```bash
php v2/scripts/blog/generate-faq-answers-optimized.php --post=slug --category=category --use-ai --regenerate-short
```

- Keeps answers that already meet the 40–80 word target
- Regenerates only answers with < 40 words, using AI
- Requires `--use-ai` (cannot be used with template mode)

**Requirements:**

- **Length:** 40-80 words per answer (target: 55-65 words)
- **Structure:** Direct answer (12-18 words) → Context (25-45 words) → Actionable info (8-15 words)
- **Tone:** Du tone (informal German)
- **Keywords:** Natural integration using actual keyword volumes and competition data
- **GSC Data:** References GSC query performance when available
- **LSI Keywords:** Integrated naturally from shared keyword database
- **HTML Formatting:** Clean answers without question tags or labels
- **SEO/AEO/GEO:** Optimized for search engines, answer engines, and generative AI
- **Ordio Mentions:** Natural, when relevant (max 1 per FAQ set)

**Enhanced AI Prompt:**

The AI prompt (Gemini/OpenAI) includes:

- **Post Context:** Title, excerpt, meta description, key sections (h2 headings)
- **Keyword Data:** Actual volumes, competition levels, clicks from GSC
- **GSC Performance:** Clicks, impressions, position for each question
- **LSI Keywords:** Up to 8 semantically related keywords
- **Strict Requirements:**
  - Primary keyword MUST appear naturally (mandatory)
  - Length enforcement (40-80 words, strictly enforced)
  - Template language avoidance (explicit list)
  - Natural integration requirements

**Output:** `docs/content/blog/posts/{category}/{slug}/data/faq-answers-optimized.json`

### Step 4: Quality Enhancement & Validation

**Enhance FAQ Quality:**

```bash
# Review and enhance FAQs (fixes HTML, removes duplicates, scores quality)
php v2/scripts/blog/enhance-faq-quality.php --post=slug --category=category --fix-html --remove-duplicates
```

**What it does:**

- Checks keyword integration (primary + LSI keywords)
- Validates answer length (40-80 words)
- Fixes HTML formatting issues (removes question tags, labels)
- Removes duplicate questions
- Scores FAQs based on data integration quality
- Generates quality report

**Validate FAQ Quality:**

```bash
php v2/scripts/blog/validate-faq-quality.php --post=slug --category=category
```

**Checks:**

- FAQ count (10-15)
- Answer length (40-80 words)
- Keyword integration (uses actual SISTRIX data)
- Natural language (AI content detection)
- Du tone consistency
- Internal link placement (2-3 total)
- HTML formatting (no question tags in answers)

**Output:**

- `docs/content/blog/FAQ_QUALITY_REPORT.md` (from enhance script)
- `docs/content/blog/FAQ_QUALITY_VALIDATION.md` (from validate script)

**Validate FAQ Schema:**

```bash
# Validate schema generation and structure
php v2/scripts/blog/validate-faq-schema.php --post=slug --category=category

# Validate all posts
php v2/scripts/blog/validate-faq-schema.php --all
```

**What schema validation checks:**

- FAQPage schema is generated correctly
- Required properties present (`@context`, `@type`, `mainEntity`)
- Question structure valid (`name`, `acceptedAnswer.text`)
- Text properly cleaned (no HTML, normalized whitespace, smart quotes replaced)
- Schema matches visible content
- No duplicate FAQPage schemas
- JSON syntax valid

**Schema Best Practices:**

- Answers are automatically stripped of HTML tags
- Text is normalized (whitespace, quotes, encoding)
- Smart quotes are replaced with regular quotes
- Control characters are removed
- UTF-8 encoding is ensured

**See:** `docs/content/blog/FAQ_SCHEMA_BEST_PRACTICES.md` for complete schema validation guide.

### Step 5: Manual Review (One-by-One)

**CRITICAL:** All FAQs must be manually reviewed one-by-one before publication. Automated generation is a starting point, not the final product.

**Interactive Review Tool:**

Use the interactive CLI tool for systematic one-by-one review:

```bash
php v2/scripts/blog/review-faq-manually.php --post=slug --category=category
```

**Review Process:**

1. **Start Review Session**

   - Tool displays post info (title, primary keyword, related keywords)
   - Shows each FAQ question and answer
   - Displays quality indicators (keyword present, length OK)

2. **For Each FAQ:**

   - Read question and answer carefully
   - Check against quality checklist (see below)
   - Choose action:
     - **[a] Approve** - FAQ meets all quality standards
     - **[e] Edit** - Minor edits needed (edit JSON manually)
     - **[r] Regenerate** - Answer needs complete regeneration
     - **[s] Skip** - Review later (not urgent)
     - **[d] Delete** - FAQ is not useful or duplicate

3. **Review Progress Saved Automatically**
   - Status saved after each FAQ
   - Can resume from where you left off
   - Review data saved to `faq-review-status.json`

**Manual Review Checklist:**

Follow `docs/content/blog/FAQ_MANUAL_REVIEW_CHECKLIST.md` for detailed checklist. Key items:

- [ ] Primary keyword appears naturally (at least 1x)
- [ ] Answer is 40-80 words
- [ ] Answers question directly in first sentence
- [ ] No template language
- [ ] Natural German (du tone)
- [ ] Clean HTML formatting
- [ ] Ordio mention only if relevant

**Common Issues to Fix:**

1. **Missing Primary Keyword** → Regenerate answer
2. **Malformed Questions** → Regenerate questions first
3. **Wrong Primary Keyword** → Fix keyword, then regenerate
4. **Template Language** → Edit or regenerate
5. **Too Short/Long** → Regenerate with length requirements

**Update Review Progress:**

```bash
php v2/scripts/blog/track-faq-review-progress.php --update
```

**Statuses:**

- **Generated:** FAQs generated but not reviewed
- **In Review:** Currently being reviewed
- **Reviewed:** FAQs reviewed using checklist
- **Approved:** FAQs approved for publication
- **Published:** FAQs added to post JSON

### Step 6: Implementation

**CRITICAL: FAQ Deduplication (before adding to post)**

Avoid duplicate or repetitive FAQs. Definition-type questions (Was ist X?, Was bedeutet X?, Was ist das X-Konzept?, Wie wird X definiert?) often produce semantic duplicates – keep only one. Comparison-type (Unterschied vs Sind X und Y das gleiche?) – keep one. Run check-faq-uniqueness.php to detect duplicates. The generate-faq-questions.php script uses stricter similarity threshold (0.55) for definition/comparison types to reduce duplicates at source.

**CRITICAL: H2-FAQ Overlap Check (before adding to post)**

FAQs must not duplicate H2 questions. Run overlap check after manual review, before add-faqs-to-post:

```bash
php v2/scripts/blog/check-h2-faq-overlap.php --post=slug --category=category
```

Overlap similarity should be < 0.65. Remove or rephrase overlapping FAQs. See [FAQ_H2_SEPARATION_GUIDELINES.md](FAQ_H2_SEPARATION_GUIDELINES.md).

**Add FAQs to Post:**

```bash
php v2/scripts/blog/add-faqs-to-post.php --post=slug --category=category --faqs=faq-answers-optimized.json
```

**What it does:**

- Adds FAQs to `faqs` array in post JSON
- **Sorts FAQs by logical flow** (definition first, then how-to, requirements, when/which, yes/no, duration, other)
- Removes any FAQ sections from HTML content
- Updates `modified_date`
- Preserves existing FAQs (merges, doesn't replace)

**Logical flow order:** Definition → How-To → Requirements → When/Why → Which → Yes/No → Duration → Other. Use `--no-sort` to preserve input order.

**For existing posts with unordered FAQs:** Run `reorder-faqs-by-logical-flow.php` to fix ordering without re-adding:

```bash
php v2/scripts/blog/reorder-faqs-by-logical-flow.php --post=slug --category=category --write
```

**Important:** FAQs are stored in the `faqs` array, NOT in HTML content. The template renders FAQs separately via `BlogFAQ.php` component.

### Step 6.5: Manual Review (MANDATORY)

**CRITICAL:** Manual review is mandatory. Batch processing is NOT allowed.

**Run Analysis Tools:**

```bash
# SEO Analysis
php v2/scripts/blog/analyze-faqs-seo.php --post=slug --category=category

# Uniqueness Check
php v2/scripts/blog/check-faq-uniqueness.php --post=slug --category=category

# Improvement Suggestions
php v2/scripts/blog/suggest-faq-improvements.php --post=slug --category=category
```

**Review Analysis Output:**

- Identify duplicate questions (remove or merge)
- Identify missing high-value queries (add FAQs)
- Identify repetitive answers (rewrite with unique angles)
- Note SEO opportunities

**Manual Edit JSON File:**

- Remove duplicate FAQs
- Rewrite repetitive FAQs with unique angles
- Add FAQs for missing high-value queries
- Optimize answers for SEO (keyword integration, length, du tone)
- Ensure each FAQ provides unique value

**Validate Changes:**

```bash
php v2/scripts/blog/check-faq-uniqueness.php --post=slug --category=category
```

**Document Review:**

- Update progress tracker (`docs/content/blog/FAQ_MANUAL_REVIEW_SEO_PROGRESS.md`)
- Note issues found and fixes applied
- Document SEO improvements

**See:** `docs/content/blog/FAQ_MANUAL_REVIEW_SEO_CHECKLIST.md` for complete checklist.

### Step 7: Schema Validation

**Validate FAQPage Schema:**

```bash
php v2/scripts/blog/validate-faq-schema.php --post=slug --category=category
```

**Checks:**

- All FAQs included in schema
- Schema answers match HTML answers exactly
- No HTML links in schema answers (plain text only)
- Valid JSON syntax

**Test with Google Rich Results Test:**
https://search.google.com/test/rich-results

### Step 8: Final Validation

**Check Display:**

- FAQs appear correctly on blog post page
- Questions/answers display properly
- Schema markup renders correctly
- Mobile responsiveness
- Accessibility (keyboard navigation)

**Update Progress:**

```bash
php v2/scripts/blog/track-faq-review-progress.php --update
```

## Batch Processing Workflow

### For Multiple Posts

**Process Posts in Batches:**

```bash
# Process Tier 1 posts (top 20)
php v2/scripts/blog/rebuild-faqs-batch.php --tier=1 --batch-size=10

# Process Tier 2 posts (next 30)
php v2/scripts/blog/rebuild-faqs-batch.php --tier=2 --batch-size=10

# Process Tier 3 posts (remaining)
php v2/scripts/blog/rebuild-faqs-batch.php --tier=3 --batch-size=10
```

**Batch Workflow:**

1. Generate FAQ research data (if not already done)
2. Generate FAQ questions
3. Generate FAQ answers
4. Quality validation
5. Manual review (use checklist)
6. Add to post JSON
7. Schema validation
8. Mark as complete

## Quality Standards

### FAQ Count

- **Target:** 10-15 FAQs per post
- **Minimum:** 10 FAQs
- **Maximum:** 20 FAQs (if high-quality)

### Answer Length

- **Target:** 40-80 words per answer
- **Optimal:** 50-70 words
- **Minimum:** 40 words
- **Maximum:** 80 words

### Keyword Integration

- **Primary keyword:** Appears naturally in 3-5 FAQs
- **LSI keywords:** Used contextually where relevant
- **Natural integration:** Not stuffed or forced
- **No keyword stuffing:** Avoid excessive repetition

### Du Tone

- **Consistent:** All answers use "du" pronouns
- **Informal:** Conversational, friendly tone
- **Active voice:** Not passive
- **No mixing:** Don't mix "Sie" and "du"

### Internal Links

- **Frequency:** 2-3 links total across all FAQs
- **Relevance:** Contextually relevant
- **Anchor text:** Natural, descriptive phrases
- **No over-optimization:** Not in every FAQ

### Ordio Mentions

- **Natural:** Only when relevant
- **Frequency:** Maximum 1 per FAQ set
- **Value:** Adds value to the answer
- **No promotion:** Not promotional language

## SEO/AEO/GEO Optimization

### SEO (Search Engine Optimization)

- **Keyword integration:** Primary keyword in 3-5 FAQs
- **LSI keywords:** Semantically related terms
- **Answer length:** 40-80 words (optimal for featured snippets)
- **Question format:** Natural question format with question mark
- **Schema markup:** FAQPage schema for rich results

### AEO (Answer Engine Optimization)

- **Direct answer structure:** Answer question immediately
- **Comprehensive coverage:** Cover all question variations
- **Structured content:** Clear structure for AI understanding
- **Context:** Sufficient context for AI systems

### GEO (Generative Engine Optimization)

- **Structured content:** Clear headings and structure
- **Entity recognition:** Proper entity markup
- **Contextual information:** Sufficient context for generative AI
- **Comprehensive coverage:** Cover related topics

## Troubleshooting

### FAQs Not Generating

**Check:**

- Research data exists (`faq-research.json`)
- SISTRIX data available (keywords, search intent, SERP features)
- GSC data available (`performance-gsc.json`)

**Solution:**

```bash
# Collect missing data
php v2/scripts/blog/collect-faq-research-data.php --post=slug --category=category
```

### generate-faq-questions Filters All PAA (0 Questions in faq-questions.json)

**Symptom:** `generate-faq-questions.php` rejects all PAA as invalid/off-topic; `faq-questions.json` is empty or has 0 valid questions.

**Solution – Manual faq-answers-optimized.json:**

1. Use PAA from `data/paa-questions.json` and FAQ matrix from `CONTENT_OUTLINE.md`.
2. Create `docs/content/blog/posts/{category}/{slug}/data/faq-answers-optimized.json`:

```json
{
  "post_slug": "slug",
  "category": "lexikon",
  "answers": [
    {"question": "Question text?", "answer": "<p>40–80 word answer in du tone.</p>"}
  ]
}
```

3. Choose 10–12 questions not covered by H2s (per PAA coverage matrix).
4. Run `add-faqs-to-post.php` and `check-h2-faq-overlap.php`.

**Example:** Krankengeld (2026-02-12) – generator filtered all; 10 FAQs added manually from PAA.

### Answer Quality Issues

**Check:**

- Answer length (should be 40-80 words)
- Natural language (not AI-generated patterns)
- Du tone consistency
- Keyword integration

**Solution:**

- Review answers manually
- Edit answers to meet quality standards
- Re-run validation script

### Answers Below 40 Words (validate-faq-schema / validate-faq-quality Fails)

**Symptom:** Some FAQ answers are below the 40-word minimum; validation fails.

**Solution – Regenerate short answers only:**

```bash
php v2/scripts/blog/generate-faq-answers-optimized.php --post=slug --category=category --use-ai --regenerate-short
```

This keeps answers that already meet the 40–80 word target and regenerates only those below 40 words. The script uses retry logic (Gemini/OpenAI) with explicit word-count feedback when answers are too short.

### Schema Validation Fails

**Check:**

- All FAQs included in schema
- Schema answers match HTML exactly
- No HTML links in schema answers
- Valid JSON syntax

**Solution:**

- Fix schema generation
- Ensure answers match exactly
- Remove HTML from schema answers

## Related Documentation

- `FAQ_BEST_PRACTICES.md` - Complete best practices guide
- `FAQ_MANUAL_REVIEW_CHECKLIST.md` - Manual review checklist
- `FAQ_REBUILD_PRIORITY_LIST.md` - Priority list for rebuild
- `FAQ_REVIEW_PROGRESS.md` - Review progress tracking
- `.cursor/rules/blog-faq-optimization.mdc` - Cursor rules

## Quick Reference

**Complete FAQ Generation (Single Post):**

```bash
# 1. Collect research data (PAA before FAQ research)
php v2/scripts/blog/collect-post-paa-questions.php --post=slug --category=category
php v2/scripts/blog/collect-faq-research-data.php --post=slug --category=category

# 2. Generate questions
php v2/scripts/blog/generate-faq-questions.php --post=slug --category=category

# 3. Generate answers
php v2/scripts/blog/generate-faq-answers-optimized.php --post=slug --category=category --use-ai

# 4. Validate quality
php v2/scripts/blog/validate-faq-quality.php --post=slug --category=category

# 5. Manual review (use checklist)

# 6. H2-FAQ overlap check (before adding)
php v2/scripts/blog/check-h2-faq-overlap.php --post=slug --category=category

# 7. Add to post
php v2/scripts/blog/add-faqs-to-post.php --post=slug --category=category --faqs=faq-answers-optimized.json

# 8. Validate schema
php v2/scripts/blog/validate-faq-schema.php --post=slug --category=category
```

**Batch Processing:**

```bash
# Process Tier 1 (top 20 posts)
php v2/scripts/blog/rebuild-faqs-batch.php --tier=1 --batch-size=10
```