# Cluster Relevance Validation Guide

**Last Updated:** 2026-03-10  
**Purpose:** Guide for validating lexikon post recommendations for cluster relevance

---

## Overview

This guide provides a step-by-step process for validating lexikon post recommendations to ensure they're actually relevant to Zeiterfassung and Dienstplan content clusters. Use this guide when adding new terms to recommendations or when reviewing existing recommendations.

---

## Validation Process

### Step 1: Research Cluster Topics

**Objective:** Understand what topics/keywords are ACTUALLY relevant to clusters.

**Actions:**
1. **Web Search:**
   - Search: "zeiterfassung verwandte themen", "arbeitszeiterfassung begriffe"
   - Search: "dienstplan verwandte themen", "schichtplanung begriffe"
   - Analyze SERP results for related topics
   - Document semantic relationships

2. **Pillar Page Analysis:**
   - Read `/insights/zeiterfassung` pillar page
   - Read `/insights/dienstplan` pillar page
   - Extract topics covered
   - Identify themes and sub-themes

3. **Existing Content Analysis:**
   - Review existing cluster posts
   - Extract topics and patterns
   - Identify content coverage gaps

**Deliverables:**
- Research report documenting cluster topics
- Expanded keyword lists
- Topic taxonomy

---

### Step 2: Expand Cluster Keywords

**Objective:** Expand keyword definitions based on research findings.

**Actions:**
1. Add related topics from research to keyword lists
2. Include semantic variations
3. Add industry-specific terms
4. Add technology-related terms
5. Document expansion rationale

**Files to Update:**
- `v2/scripts/blog/analyze-lexikon-cluster-relevance.py`
- `v2/scripts/blog/validate-lexikon-cluster-relevance.py`

---

### Step 3: Run Validation Script

**Objective:** Validate terms using multi-factor scoring.

**Command:**
```bash
python3 v2/scripts/blog/validate-lexikon-cluster-relevance.py
```

**Scoring Factors:**
- **Cluster Match (50 points):** Direct match to Zeiterfassung/Dienstplan clusters
- **Keyword Density (30 points):** How many cluster keywords match
- **Source Count (20 points):** Competitor coverage

**Relevance Threshold:** 60/100

**Output:**
- `docs/seo-strategy-2026/research/term-validation-results.md`

---

### Step 4: Review Validation Results

**Objective:** Review scores and classifications, check for false positives/negatives.

**Actions:**
1. Review validated terms (score ≥60)
2. Review filtered terms (score <60)
3. Check for false positives (should be filtered but passed)
4. Check for false negatives (should pass but filtered)
5. Manual review of edge cases

**Edge Cases to Review:**
- Terms near threshold (55-65)
- Ambiguous classifications
- Terms with high priority but low relevance score

---

### Step 5: Generate Recommendations

**Objective:** Create validated recommendations table.

**Actions:**
1. Extract validated terms from validation results
2. Prioritize by relevance score + priority score
3. Group by cluster and relevance level
4. Generate markdown table

**Command:**
```bash
python3 v2/scripts/blog/generate-validated-recommendations.py
```

**Output:**
- `tmp/validated-recommendations-table.md`

---

### Step 6: Update Documentation

**Objective:** Update plan files with validated recommendations.

**Files to Update:**
- `docs/seo-strategy-2026/plans/new-content-creation-plan.md`
- `docs/seo-strategy-2026/ZEITERFASSUNG_DIENSTPLAN_SEO_STRATEGY_NOTION.md`

**Actions:**
1. Replace old recommendations table with validated table
2. Update cluster breakdown sections
3. Update summary statistics
4. Document filtering criteria

---

## Quality Standards

### Relevance Threshold: 60/100

**Minimum Requirements:**
- Cluster match: Direct match to Zeiterfassung/Dienstplan clusters
- Keyword density: At least 1 cluster keyword match
- Source count: At least 2 competitor sources

**Filtering Rules:**
- Terms in filter list: Score = 0 (immediate disqualification)
- Supporting cluster terms: Must score ≥60 to be included
- Terms below threshold: Filtered out, documented

---

## Validation Checklist

Before adding a term to recommendations, verify:

- [ ] Term relates directly to time tracking/shift planning
- [ ] Term appears in SERP results with cluster keywords
- [ ] Term used in context of cluster topics in competitor content
- [ ] Term supports cluster authority (not dilutes it)
- [ ] Relevance score ≥60/100
- [ ] Not in filter list (generation_z, bewerbermanagement, etc.)

---

## Common Issues & Solutions

### Issue: Term has high priority but low relevance score

**Solution:** Check if term is actually relevant to clusters:
- Search: "{term} zeiterfassung" and "{term} dienstplan"
- Check SERP results for co-occurrence
- Review competitor content for context
- If not relevant, filter out despite high priority

### Issue: Term is relevant but scored below threshold

**Solution:** Review scoring factors:
- Check keyword density (may need to expand keywords)
- Verify cluster classification (may be misclassified)
- Review source count (may need more competitor coverage)
- Consider manual override if clearly relevant

### Issue: Term is in filter list but seems relevant

**Solution:** Review filter list rationale:
- Check web search results for actual relationship
- Verify semantic similarity to cluster topics
- If actually relevant, remove from filter list and re-validate

---

## Filter List

**Terms Automatically Filtered (Score = 0):**

**Generation Terms:**
- `generation_z`, `generation_y`, `generation_x`, `generation_alpha`

**Recruiting Terms:**
- `bewerbermanagement`, `stellenanzeige`, `recruiting`, `onboarding`
- `candidate_relationship_management`, `data_driven_recruiting`
- `google_for_jobs`, `headhunting`, `recruiter`

**HR Analytics:**
- `hr_analytics`

**Note:** These terms may appear in "supporting" cluster but should NOT be included in Zeiterfassung/Dienstplan cluster recommendations unless there's a direct connection (e.g., "zeiterfassung onboarding" or "dienstplan recruiting").

---

## Best Practices

1. **Always Research First:** Don't rely solely on keyword matching
2. **Check SERP Results:** Verify terms appear with cluster keywords
3. **Review Competitor Content:** Check how competitors use terms
4. **Validate Before Adding:** Run validation script before adding to recommendations
5. **Document Decisions:** Record why terms were included/excluded
6. **Review Regularly:** Re-validate when expanding clusters or adding new terms

---

## Related Documentation

- `cluster-relevance-validation-report.md` - Comprehensive validation report
- `zeiterfassung-cluster-topics-research.md` - Zeiterfassung research
- `dienstplan-cluster-topics-research.md` - Dienstplan research
- `term-validation-results.md` - Validation results
- `validate-lexikon-cluster-relevance.py` - Validation script
