# Blog Related Posts Logic Documentation

**Last Updated:** 2026-02-12

Complete guide to the intelligent related posts algorithm used in the blog related posts carousel.

## Overview

The blog related posts carousel uses an enhanced algorithm (`load_related_posts_enhanced()`) that intelligently matches related posts using multiple data sources and fallback tiers. This ensures that every blog post has relevant related content, even for new posts without manually configured relationships.

## Algorithm Architecture

### Multi-Tier Fallback System

The algorithm uses a 5-tier fallback system to find related posts:

```mermaid
graph TD
    A[Load Current Post] --> B{Tier 1: Has related_posts?}
    B -->|Yes| C[Use Existing Related Posts]
    B -->|No| D[Tier 2: Cluster Matching]
    D -->|Insufficient| E[Tier 3: Topic Matching]
    E -->|Insufficient| F[Tier 4: Category Matching]
    F -->|Insufficient| G[Tier 5: Recency Boost]
    C --> H[Score & Rank Posts]
    D --> H
    E --> H
    F --> H
    G --> H
    H --> I[Return Top N Posts]
```

### Tier 1: Existing Related Posts

**Priority**: Highest

If the post JSON file contains a `related_posts` array with items, these are used first.

**Scoring**: Uses `similarity_score` from the related_posts data (converted to 0-100 scale)

**Advantages**:

- Manual curation possible
- Can include specific relationships
- Fast (no calculation needed)

**Example**:

```json
{
  "related_posts": [
    {
      "slug": "kunden-werben-kunden",
      "category": "ratgeber",
      "similarity_score": 0.574,
      "relationship_type": "definition_to_guide"
    }
  ]
}
```

### Tier 2: Cluster-Based Matching

**Priority**: High

Matches posts based on content clusters (primary and secondary).

**Scoring**:

- Primary cluster match: 10 points
- Secondary cluster match: 5 points per match
- Cross-cluster match (primary matches secondary): 5 points

**Data Source**: Post JSON `clusters` field first; if missing or empty, **fallback to `docs/data/blog-cluster-mapping.json`** (lookup by post URL). The cluster mapping provides full coverage (360+ posts) for pillar links, tools links, and related-posts scoring when individual post JSONs lack cluster data.

**Example**:

- Current post: `primary: "personalverwaltung"`, `secondary: ["compliance"]`
- Candidate post: `primary: "personalverwaltung"` → +10 points
- Candidate post: `primary: "compliance"`, `secondary: ["personalverwaltung"]` → +5 points (cross-match)

### Tier 3: Topic-Based Matching

**Priority**: Medium-High

Matches posts based on shared topics extracted from content.

**Scoring**: 3 points per shared topic

**Data Source**: `docs/data/blog-topics-extracted.json` or post JSON `topics` field

**Example**:

- Current post topics: `["personalverwaltung", "zeiterfassung"]`
- Candidate post topics: `["personalverwaltung", "compliance"]`
- Shared topics: `["personalverwaltung"]` → +3 points

### Tier 4: Category-Based Matching

**Priority**: Medium

Matches posts from the same category (lexikon, ratgeber, inside-ordio).

**Scoring**: 2 points for same category

**Example**:

- Current post: `category: "lexikon"`
- Candidate post: `category: "lexikon"` → +2 points

### Tier 5: Recency Boost

**Priority**: Low (multiplier)

Applies a multiplier to boost recent posts.

**Scoring**:

- Posts within 6 months: 1.2x multiplier
- Posts within 1 year: 1.1x multiplier
- Older posts: No multiplier

**Example**:

- Post published 3 months ago with score 10 → 10 × 1.2 = 12 points
- Post published 8 months ago with score 10 → 10 × 1.1 = 11 points
- Post published 2 years ago with score 10 → 10 points (no multiplier)

## Scoring Function

The `calculate_related_score()` function combines all scoring factors:

```php
function calculate_related_score($current_post, $candidate_post, $current_cluster, $candidate_cluster, $current_topics, $candidate_topics) {
    $score = 0.0;

    // Cluster matching (10 points primary, 5 points secondary)
    // Topic matching (3 points per shared topic)
    // Category matching (2 points)
    // Recency multiplier (1.2x or 1.1x)

    return $score;
}
```

## Final Ranking

After scoring all candidate posts:

1. **Sort by Score**: Descending order (highest score first)
2. **Tie-Breaker**: If scores are equal, sort by publication_date (newer first)
3. **Limit**: Return top N posts (default: 12)

## Relationship Types Priority

When using existing `related_posts` data, relationship types are prioritized:

1. **`definition_to_guide`**: Lexikon → Ratgeber (highest priority)
2. **`related`**: Same cluster/topics (high priority)
3. **`complementary`**: Different aspects of same topic (medium priority)
4. **`problem_to_solution`**: Problem → Solution pattern (medium priority)
5. **Category match**: Same category (low priority)

## Data Sources

### Primary Data Sources

1. **Post JSON Files** (`v2/data/blog/posts/{category}/{slug}.json`)

   - `related_posts` array
   - `clusters` object (primary, secondary)
   - `topics` array
   - `category` field
   - `publication_date` field

2. **Cluster Mapping** (`docs/data/blog-cluster-mapping.json`)

   - Fallback for cluster data if not in post JSON
   - Contains `primary_cluster` and `secondary_clusters` per post

3. **Topic Extraction** (`docs/data/blog-topics-extracted.json`)
   - Fallback for topic data if not in post JSON
   - Contains extracted topics per post

### Caching Strategy

- **Static Caching**: Cluster and topic data files are cached per request (loaded once)
- **Result Caching**: Related posts calculation is cached per post (avoid recalculation)
- **Cache Key**: `"{category}/{slug}/{limit}"`

## Multi-Resource Recommendations

The blog carousel now includes templates, downloads, and tools alongside blog posts, providing a comprehensive resource discovery experience.

### Resource Types

The system supports four resource types:

1. **Blog Posts** (`RESOURCE_TYPE_BLOG`) - Existing blog posts
2. **Templates** (`RESOURCE_TYPE_TEMPLATE`) - Excel templates for HR/payroll
3. **Downloads** (`RESOURCE_TYPE_DOWNLOAD`) - PDF guides, checklists, webinars
4. **Tools** (`RESOURCE_TYPE_TOOL`) - Calculators and interactive tools

### Unified Resource Loading

The `load_related_resources_unified()` function combines all resource types:

```php
$resources = load_related_resources_unified($post_slug, $category, $limit = 12);
```

**Parameters:**

- `$post_slug` - Current blog post slug
- `$category` - Current blog post category
- `$limit` - Maximum number of resources to return (default: 12)
- `$options` - Optional array:
  - `blog_ratio` (float, 0-1): Ratio of blog posts in result (default: 0.5)
  - `min_relevance` (int): Minimum relevance score to include (default: 5)

**Returns:** Array of unified resources sorted by relevance score

### Relevance Scoring for Other Resources

Resources are scored using `calculate_resource_relevance()` with weighted factors:

- **Cluster match** (10 points): Primary cluster matches resource category/tags
- **Topic/keyword match** (5 points per match): Topics match resource keywords
- **Tag match** (3 points per match): Resource tags match cluster keywords
- **Description keyword match** (1 point per match): Keywords found in descriptions

### Cluster-to-Category Mapping

Blog clusters map to resource categories:

- `dienstplan` → Templates: `shift_planning`, Tools: `Schichtplanung`
- `zeiterfassung` → Templates: `time_tracking`, Tools: `Zeiterfassung`
- `personalverwaltung` → Templates: `employee_management`, Tools: `Personalverwaltung`
- `compliance` → Templates: `compliance`, Tools: `Compliance & Recht`
- `lohnabrechnung` → Templates: `payroll`, Tools: `Lohnabrechnung`

See `get_cluster_to_category_mapping()` for complete mapping.

### Resource Loading Functions

- `load_templates_for_blog_post($post_cluster, $post_topics, $limit)` - Load relevant templates
- `load_downloads_for_blog_post($post_cluster, $post_topics, $limit)` - Load relevant downloads
- `load_tools_for_blog_post($post_cluster, $post_topics, $limit)` - Load relevant tools

Each function:

- Takes blog post cluster and topics
- Matches against resource metadata
- Returns scored resources with relevance scores
- Caches results for performance

### Sorting and Prioritization

Resources are sorted by:

1. Relevance score (descending)
2. Resource type (blog posts prioritized when scores are equal)

The unified loader ensures a balanced mix of resource types while prioritizing the most relevant content.

For complete documentation on resource matching, see [Resource Matching Guide](RESOURCE_MATCHING_GUIDE.md).

## Future-Proofing

### Auto-Calculation for New Posts

The algorithm automatically works for new posts:

1. **No Manual Configuration Needed**: If `related_posts` is empty or missing, the algorithm calculates matches
2. **Graceful Fallback**: Falls back through tiers to ensure posts always have related content
3. **Cluster/Topic Data**: Uses extracted cluster and topic data from analysis files

### Maintenance

**For New Posts**:

- No action needed - algorithm auto-calculates relationships
- Optionally add `related_posts` array for manual curation

**For Existing Posts**:

- Existing `related_posts` data is prioritized (Tier 1)
- Algorithm falls back to calculation if insufficient posts found

**Periodic Updates** (optional):

- Run `scripts/blog/update-related-posts.php` (future script) to refresh relationships
- Updates JSON files with new `related_posts` data based on latest cluster/topic analysis

## Performance Considerations

### Optimization

1. **Caching**: Cluster and topic data loaded once per request
2. **Early Exit**: If Tier 1 provides enough posts, no calculation needed
3. **Efficient Lookups**: Uses lookup maps for cluster/topic data

### Performance Impact

- **With Existing Related Posts**: ~1ms (just loading from JSON)
- **With Calculation**: ~50-100ms (depending on number of posts)
- **Cached Results**: ~0.1ms (from static cache)

## Unified Related Resources (Carousel)

The blog post carousel uses `load_related_resources_unified()` which **mixes blog posts with tools, templates, downloads, products, and industry pages**—not just blog posts.

### Mix by Cluster/Topic Status

| Post State | Blog Limit | Other Resources | Result |
|------------|------------|-----------------|--------|
| Has cluster (e.g. zeiterfassung) | 6 | Templates, downloads, tools, products, industries | ~6 blog + 6 mixed |
| No cluster, slug matches tool topics | 8 | Tools (4) | ~8 blog + 4 tools |
| No cluster, no slug match | 12 | None | 12 blog |

### Slug-to-Tool Matching (New Posts)

For posts without cluster data, topics are derived from the slug via `derive_topics_from_slug_for_tools()` so relevant tools (e.g. Brutto-Netto-Rechner) appear for topics like spesen, reisekosten, verpflegungsmehraufwand.

### suggest-related-posts.php

- Default limit: 14
- Adds pillar pages when slug matches pillar keywords (dienstplan, zeiterfassung)
- Pillar keywords expanded for lohn, gehalt, pausenzeiten (zeiterfassung pillar)

## Usage

### Function Signature

```php
function load_related_posts_enhanced($post_slug, $category, $limit = 12): array
```

### Parameters

- `$post_slug` (string): Slug of the current post
- `$category` (string): Category of the current post (lexikon, ratgeber, inside-ordio)
- `$limit` (int): Maximum number of related posts to return (default: 12)

### Return Value

Returns an array of post data arrays, sorted by relevance score (descending).

### Example

```php
// Load related posts for a blog post
$related_posts = load_related_posts_enhanced('employer-branding', 'lexikon', 12);

// Use in carousel component
foreach ($related_posts as $post) {
    // Display post card
}
```

## Testing

See `scripts/blog/test-related-posts.php` for comprehensive test cases covering:

- Posts with existing related_posts
- Posts without related_posts (fallback system)
- Edge cases (no cluster, no topics, etc.)
- Invalid inputs

## Related Files

- **Function**: `v2/config/blog-template-helpers.php` - `load_related_posts_enhanced()`, `load_related_resources_unified()`
- **Component**: `v2/base/blog_related_carousel.php` - Carousel component
- **Resource Card**: `v2/components/blog/ResourceCard.php` - Unified resource card component
- **Test Scripts**:
  - `scripts/blog/test-related-posts.php` - Blog posts test suite
  - `scripts/blog/test-unified-resources.php` - Unified resources test suite
- **Data Files**:
  - `docs/data/blog-cluster-mapping.json` - Cluster assignments
  - `docs/data/blog-topics-extracted.json` - Topic extraction
  - `v2/data/blog/posts/{category}/{slug}.json` - Post data
  - `v2/data/templates_index_data.php` - Template data
  - `v2/data/downloads_index_data.php` - Download data
  - `v2/data/tools_index_data.php` - Tool data

## Related Documentation

- [Component API](./COMPONENT_API.md) - Blog Related Posts Carousel and ResourceCard component documentation
- [Resource Matching Guide](./RESOURCE_MATCHING_GUIDE.md) - Complete guide to resource matching logic
- [Content Relationships](./CONTENT_RELATIONSHIPS.md) - Content relationship analysis
- [Cluster Analysis](./CLUSTER_ANALYSIS.md) - Cluster structure and distribution
- [Topic Analysis](./TOPIC_ANALYSIS.md) - Topic extraction and taxonomy
