# WordPress to PHP Template Data Structure Mapping

**Last Updated:** 2026-01-09

Complete mapping of WordPress blog data structure to PHP template data requirements, including JSON schema definitions, transformation patterns, and data loading specifications.

## Overview

This document maps the WordPress blog data structure (from extracted JSON files) to PHP template data requirements, defining the JSON schema for blog posts, categories, relationships, and supporting data structures.

## WordPress Data Sources

### Extracted Data Files

1. **`blog-posts-metadata.json`** - Post metadata (title, meta tags, dates, featured image)
2. **`blog-posts-content-full.json`** - Full HTML content and images
3. **`blog-cluster-mapping.json`** - Content cluster assignments
4. **`blog-topics-extracted.json`** - Topic assignments per post
5. **`blog-content-relationships.json`** - Post relationship data
6. **`blog-internal-links.json`** - Internal link data

### WordPress Data Fields

**From `blog-posts-metadata.json`**:

- `url` - Full WordPress URL
- `title` - Post title (with category and brand)
- `meta_title` - SEO title tag
- `meta_description` - SEO meta description
- `publication_date` - ISO 8601 date
- `featured_image` - Featured image URL
- `word_count` - Word count
- `h1` - Main heading

**From `blog-posts-content-full.json`**:

- `content.html` - Full HTML content
- `content.text` - Plain text content
- `content.word_count` - Word count
- `images[]` - Array of image objects (featured + content)

**From `blog-cluster-mapping.json`**:

- `primary_cluster` - Primary content cluster
- `secondary_clusters[]` - Secondary clusters
- `cluster_scores{}` - Cluster relevance scores

**From `blog-topics-extracted.json`**:

- `topics[]` - Array of topic identifiers

**From `blog-content-relationships.json`**:

- `similarity_score` - Semantic similarity (0-1)
- `relationship_type` - Type of relationship
- `shared_topics[]` - Shared topics
- `has_existing_link` - Boolean

## PHP Template Data Requirements

### Blog Post JSON Schema

**File Location**: `v2/data/blog/posts/{category}/{slug}.json`

**Schema Definition**:

```json
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "required": [
    "slug",
    "title",
    "category",
    "url",
    "publication_date",
    "content"
  ],
  "properties": {
    "slug": {
      "type": "string",
      "description": "URL slug (e.g., 'leitfaden-zur-finanzbuchhaltung')"
    },
    "title": {
      "type": "string",
      "description": "Post title (H1, without category/brand)"
    },
    "category": {
      "type": "string",
      "enum": ["lexikon", "ratgeber", "inside-ordio"],
      "description": "Category slug"
    },
    "category_label": {
      "type": "string",
      "description": "Category display name (e.g., 'Lexikon')"
    },
    "url": {
      "type": "string",
      "format": "uri",
      "description": "Canonical URL (e.g., '/insights/lexikon/leitfaden-zur-finanzbuchhaltung/')"
    },
    "publication_date": {
      "type": "string",
      "format": "date-time",
      "description": "ISO 8601 publication date"
    },
    "modified_date": {
      "type": "string",
      "format": "date-time",
      "description": "ISO 8601 last modified date. Used to display 'last updated' date when 7+ days after publication date. Should be updated whenever post content is modified."
    },
    "author": {
      "type": "object",
      "properties": {
        "name": {
          "type": "string",
          "description": "Author name (e.g., 'Emma')"
        },
        "url": {
          "type": "string",
          "format": "uri",
          "description": "Author page URL (optional)"
        }
      }
    },
    "featured_image": {
      "type": "object",
      "properties": {
        "src": {
          "type": "string",
          "format": "uri",
          "description": "Image URL (e.g., '/assets/blog-images/image.webp')"
        },
        "alt": {
          "type": "string",
          "description": "Alt text"
        },
        "width": {
          "type": "integer",
          "description": "Image width in pixels"
        },
        "height": {
          "type": "integer",
          "description": "Image height in pixels"
        },
        "srcset": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "src": {
                "type": "string",
                "format": "uri"
              },
              "width": {
                "type": "integer"
              }
            }
          }
        }
      }
    },
    "excerpt": {
      "type": "string",
      "maxLength": 300,
      "description": "Post excerpt/preview (plain text, 150-300 chars)"
    },
    "content": {
      "type": "object",
      "properties": {
        "html": {
          "type": "string",
          "description": "Full HTML content (sanitized)"
        },
        "text": {
          "type": "string",
          "description": "Plain text content (for search/excerpt)"
        },
        "word_count": {
          "type": "integer",
          "description": "Word count"
        }
      }
    },
    "images": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "src": {
            "type": "string",
            "format": "uri"
          },
          "alt": {
            "type": "string"
          },
          "type": {
            "type": "string",
            "enum": ["featured", "content"],
            "description": "Image type"
          },
          "width": {
            "type": "integer"
          },
          "height": {
            "type": "integer"
          }
        }
      }
    },
    "meta": {
      "type": "object",
      "properties": {
        "title": {
          "type": "string",
          "maxLength": 60,
          "description": "SEO title tag"
        },
        "description": {
          "type": "string",
          "maxLength": 160,
          "description": "SEO meta description"
        },
        "keywords": {
          "type": "array",
          "items": {
            "type": "string"
          },
          "description": "SEO keywords"
        },
        "robots": {
          "type": "string",
          "description": "Robots meta tag value"
        }
      }
    },
    "topics": {
      "type": "array",
      "items": {
        "type": "string",
        "description": "Topic identifier (e.g., 'zeiterfassung', 'dienstplan')"
      }
    },
    "clusters": {
      "type": "object",
      "properties": {
        "primary": {
          "type": "string",
          "description": "Primary cluster identifier"
        },
        "secondary": {
          "type": "array",
          "items": {
            "type": "string"
          },
          "description": "Secondary cluster identifiers"
        }
      }
    },
    "related_posts": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "slug": {
            "type": "string"
          },
          "title": {
            "type": "string"
          },
          "url": {
            "type": "string",
            "format": "uri"
          },
          "category": {
            "type": "string"
          },
          "featured_image": {
            "type": "object",
            "properties": {
              "src": {
                "type": "string",
                "format": "uri"
              },
              "alt": {
                "type": "string"
              }
            }
          },
          "excerpt": {
            "type": "string"
          },
          "publication_date": {
            "type": "string",
            "format": "date-time"
          },
          "similarity_score": {
            "type": "number",
            "minimum": 0,
            "maximum": 1,
            "description": "Semantic similarity score"
          },
          "relationship_type": {
            "type": "string",
            "enum": [
              "related",
              "definition_to_guide",
              "problem_to_solution",
              "complementary"
            ]
          }
        }
      },
      "maxItems": 14,
      "description": "Related posts (max 14)"
    },
    "reading_time": {
      "type": "integer",
      "description": "Estimated reading time in minutes"
    },
    "internal_links": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "url": {
            "type": "string",
            "format": "uri"
          },
          "anchor_text": {
            "type": "string"
          },
          "target_type": {
            "type": "string",
            "enum": [
              "blog",
              "tool",
              "product",
              "pillar",
              "template",
              "comparison",
              "external"
            ]
          }
        }
      }
    }
  }
}
```

### Category JSON Schema

**File Location**: `v2/data/blog/categories.json`

**Schema Definition**:

```json
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "categories": {
      "type": "array",
      "items": {
        "type": "object",
        "required": ["slug", "name", "url"],
        "properties": {
          "slug": {
            "type": "string",
            "enum": ["lexikon", "ratgeber", "inside-ordio"]
          },
          "name": {
            "type": "string",
            "description": "Display name (e.g., 'Lexikon')"
          },
          "description": {
            "type": "string",
            "description": "Category description"
          },
          "url": {
            "type": "string",
            "format": "uri",
            "description": "Category archive URL"
          },
          "post_count": {
            "type": "integer",
            "description": "Number of posts in category"
          },
          "meta": {
            "type": "object",
            "properties": {
              "title": {
                "type": "string"
              },
              "description": {
                "type": "string"
              },
              "robots": {
                "type": "string",
                "default": "noindex, follow"
              }
            }
          }
        }
      }
    }
  }
}
```

### Topic JSON Schema

**File Location**: `v2/data/blog/topics.json`

**Schema Definition**:

```json
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "topics": {
      "type": "array",
      "items": {
        "type": "object",
        "required": ["id", "name"],
        "properties": {
          "id": {
            "type": "string",
            "description": "Topic identifier (e.g., 'zeiterfassung')"
          },
          "name": {
            "type": "string",
            "description": "Display name (e.g., 'Zeiterfassung')"
          },
          "description": {
            "type": "string",
            "description": "Topic description"
          },
          "post_count": {
            "type": "integer",
            "description": "Number of posts with this topic"
          },
          "hub_url": {
            "type": "string",
            "format": "uri",
            "description": "Topic hub page URL (if exists)"
          }
        }
      }
    }
  }
}
```

## Data Transformation Mapping

### WordPress → PHP Template Mapping

| WordPress Field        | PHP Template Field   | Transformation                         | Notes                                 |
| ---------------------- | -------------------- | -------------------------------------- | ------------------------------------- |
| `url`                  | `url`                | Extract path, remove domain            | `/insights/{category}/{slug}/`        |
| `title`                | `title`              | Remove `\| {Category} \| Ordio` suffix | Keep H1 text only                     |
| `meta_title`           | `meta.title`         | Use as-is                              | Already formatted                     |
| `meta_description`     | `meta.description`   | Use as-is                              | Already formatted                     |
| `publication_date`     | `publication_date`   | Use as-is                              | ISO 8601 format                       |
| `featured_image`       | `featured_image.src` | Convert to static path                 | `/assets/blog-images/{filename}.webp` |
| `word_count`           | `content.word_count` | Use as-is                              | Integer                               |
| `h1`                   | `title`              | Use as-is                              | Post title                            |
| `content.html`         | `content.html`       | Sanitize HTML, update image paths      | Remove unsafe tags, update src        |
| `content.text`         | `content.text`       | Use as-is                              | Plain text                            |
| `primary_cluster`      | `clusters.primary`   | Use as-is                              | Cluster identifier                    |
| `secondary_clusters[]` | `clusters.secondary` | Use as-is                              | Array of identifiers                  |
| `topics[]`             | `topics`             | Use as-is                              | Array of topic IDs                    |
| Relationship data      | `related_posts[]`    | Transform to post references           | Include similarity, type              |

### URL Transformation

**WordPress URL Pattern**:

```
https://www.ordio.com/insights/{category}/{slug}/
```

**PHP Template URL Pattern**:

```
/insights/{category}/{slug}/
```

**Transformation**:

```php
function transform_url($wordpress_url) {
    // Remove domain, keep path
    $path = parse_url($wordpress_url, PHP_URL_PATH);
    return $path;
}
```

### Slug Extraction

**From WordPress URL**:

```
https://www.ordio.com/insights/lexikon/leitfaden-zur-finanzbuchhaltung/
```

**Extract**:

- Category: `lexikon` (from path segment)
- Slug: `leitfaden-zur-finanzbuchhaltung` (last path segment)

**Transformation**:

```php
function extract_slug_and_category($wordpress_url) {
    $path = parse_url($wordpress_url, PHP_URL_PATH);
    $segments = explode('/', trim($path, '/'));
    // segments: ['insights', 'lexikon', 'leitfaden-zur-finanzbuchhaltung']
    return [
        'category' => $segments[1] ?? '',
        'slug' => $segments[2] ?? ''
    ];
}
```

### Image Path Transformation

**WordPress Image URL**:

```
https://www.ordio.com/wp-content/uploads/2023/09/image-name.jpeg
```

**PHP Template Image Path**:

```
/assets/blog-images/image-name.webp
```

**Transformation**:

```php
function transform_image_path($wordpress_url, $target_format = 'webp') {
    // Extract filename
    $filename = basename($wordpress_url);
    // Remove extension
    $name = pathinfo($filename, PATHINFO_FILENAME);
    // Return new path
    return "/assets/blog-images/{$name}.{$target_format}";
}
```

### Title Transformation

**WordPress Title**:

```
Leitfaden zur Finanzbuchhaltung | Lexikon | Ordio
```

**PHP Template Title**:

```
Leitfaden zur Finanzbuchhaltung
```

**Transformation**:

```php
function extract_title($wordpress_title) {
    // Remove " | {Category} | Ordio" suffix
    $title = preg_replace('/\s*\|\s*[^|]+\s*\|\s*Ordio\s*$/', '', $wordpress_title);
    return trim($title);
}
```

### Excerpt Generation

**From Content**:

- Use first 150-300 characters of `content.text`
- Ensure it ends at word boundary
- Add ellipsis if truncated

**Transformation**:

```php
function generate_excerpt($text, $max_length = 200) {
    if (strlen($text) <= $max_length) {
        return $text;
    }
    $excerpt = substr($text, 0, $max_length);
    $last_space = strrpos($excerpt, ' ');
    if ($last_space !== false) {
        $excerpt = substr($excerpt, 0, $last_space);
    }
    return $excerpt . '...';
}
```

### Related Posts Transformation

**From `blog-content-relationships.json`**:

```json
{
  "source_url": "...",
  "target_url": "...",
  "similarity_score": 0.686,
  "relationship_type": "related",
  "shared_topics": ["zeiterfassung"]
}
```

**To PHP Template Format**:

```json
{
  "slug": "target-slug",
  "title": "Target Post Title",
  "url": "/insights/category/target-slug/",
  "similarity_score": 0.686,
  "relationship_type": "related"
}
```

**Transformation**:

```php
function transform_relationship($relationship, $posts_index) {
    $target_url = $relationship['target_url'];
    $target_post = $posts_index[$target_url] ?? null;

    if (!$target_post) {
        return null;
    }

    return [
        'slug' => $target_post['slug'],
        'title' => $target_post['title'],
        'url' => $target_post['url'],
        'category' => $target_post['category'],
        'featured_image' => $target_post['featured_image'] ?? null,
        'excerpt' => $target_post['excerpt'] ?? null,
        'publication_date' => $target_post['publication_date'],
        'similarity_score' => $relationship['similarity_score'],
        'relationship_type' => $relationship['relationship_type']
    ];
}
```

## Data Loading Patterns

### Helper Function: Load Post Data

**Function**: `load_blog_post($category, $slug)`

**Location**: `v2/config/blog-template-helpers.php`

**Implementation Pattern**:

```php
function load_blog_post($category, $slug) {
    $file_path = __DIR__ . "/../data/blog/posts/{$category}/{$slug}.json";

    if (!file_exists($file_path)) {
        return null;
    }

    $json_content = file_get_contents($file_path);
    $post_data = json_decode($json_content, true);

    if (json_last_error() !== JSON_ERROR_NONE) {
        error_log("Failed to parse blog post JSON: {$file_path}");
        return null;
    }

    return $post_data;
}
```

### Helper Function: Load Posts by Category

**Function**: `load_blog_posts_by_category($category, $limit = null, $offset = 0)`

**Implementation Pattern**:

```php
function load_blog_posts_by_category($category, $limit = null, $offset = 0) {
    $category_dir = __DIR__ . "/../data/blog/posts/{$category}";

    if (!is_dir($category_dir)) {
        return [];
    }

    $files = glob("{$category_dir}/*.json");
    $posts = [];

    foreach ($files as $file) {
        $json_content = file_get_contents($file);
        $post_data = json_decode($json_content, true);

        if ($post_data) {
            $posts[] = $post_data;
        }
    }

    // Sort by publication_date (newest first)
    usort($posts, function($a, $b) {
        return strtotime($b['publication_date']) - strtotime($a['publication_date']);
    });

    // Apply pagination
    if ($limit !== null) {
        $posts = array_slice($posts, $offset, $limit);
    }

    return $posts;
}
```

### Helper Function: Load All Posts

**Function**: `load_all_blog_posts($limit = null, $offset = 0)`

**Implementation Pattern**:

```php
function load_all_blog_posts($limit = null, $offset = 0) {
    $categories = ['lexikon', 'ratgeber', 'inside-ordio'];
    $all_posts = [];

    foreach ($categories as $category) {
        $category_posts = load_blog_posts_by_category($category);
        $all_posts = array_merge($all_posts, $category_posts);
    }

    // Sort by publication_date (newest first)
    usort($all_posts, function($a, $b) {
        return strtotime($b['publication_date']) - strtotime($a['publication_date']);
    });

    // Apply pagination
    if ($limit !== null) {
        $all_posts = array_slice($all_posts, $offset, $limit);
    }

    return $all_posts;
}
```

### Helper Function: Load Related Posts

**Function**: `load_related_posts($post_slug, $category, $limit = 14)`

**Implementation Pattern**:

```php
function load_related_posts($post_slug, $category, $limit = 14) {
    $post = load_blog_post($category, $post_slug);

    if (!$post || !isset($post['related_posts'])) {
        return [];
    }

    $related_posts = [];

    foreach ($post['related_posts'] as $related_ref) {
        $related_post = load_blog_post(
            $related_ref['category'],
            $related_ref['slug']
        );

        if ($related_post) {
            $related_posts[] = $related_post;
        }

        if (count($related_posts) >= $limit) {
            break;
        }
    }

    return $related_posts;
}
```

### Helper Function: Load Categories

**Function**: `load_blog_categories()`

**Implementation Pattern**:

```php
function load_blog_categories() {
    $file_path = __DIR__ . "/../data/blog/categories.json";

    if (!file_exists($file_path)) {
        return [];
    }

    $json_content = file_get_contents($file_path);
    $data = json_decode($json_content, true);

    return $data['categories'] ?? [];
}
```

## Data Directory Structure

### Recommended Structure

```
v2/data/blog/
├── posts/
│   ├── lexikon/
│   │   ├── leitfaden-zur-finanzbuchhaltung.json
│   │   ├── personalfragebogen.json
│   │   └── ...
│   ├── ratgeber/
│   │   ├── fehler-digitalisierung-gastronomie.json
│   │   └── ...
│   └── inside-ordio/
│       ├── david-keuenhof-im-kuechenherde-podcast.json
│       └── ...
├── categories.json
├── topics.json
└── relationships.json (optional, for advanced features)
```

### File Naming Convention

- **Posts**: `{slug}.json` (e.g., `leitfaden-zur-finanzbuchhaltung.json`)
- **Categories**: `categories.json`
- **Topics**: `topics.json`
- **Slugs**: Lowercase, hyphens, no special characters

## Data Validation

### Required Fields Validation

**Post Data**:

- `slug` - Required, non-empty string
- `title` - Required, non-empty string
- `category` - Required, must be valid category
- `url` - Required, valid URL path
- `publication_date` - Required, valid ISO 8601 date
- `content.html` - Required, non-empty string

**Category Data**:

- `slug` - Required, must match directory name
- `name` - Required, non-empty string
- `url` - Required, valid URL path

### Data Type Validation

**Dates**: Must be ISO 8601 format (`YYYY-MM-DDTHH:MM:SS+00:00`)

**URLs**: Must be absolute paths starting with `/`

**Images**: Must have `src`, `alt` (can be empty), `width`, `height`

**Arrays**: Must be arrays (not objects)

## Performance Considerations

### Caching Strategy

**Post Data Caching**:

- Cache loaded post data in memory (PHP array)
- Invalidate cache on file modification
- Use `filemtime()` for cache invalidation

**Implementation Pattern**:

```php
static $post_cache = [];

function load_blog_post_cached($category, $slug) {
    $cache_key = "{$category}/{$slug}";
    $file_path = __DIR__ . "/../data/blog/posts/{$category}/{$slug}.json";

    if (isset($post_cache[$cache_key])) {
        $cached = $post_cache[$cache_key];
        if ($cached['mtime'] === filemtime($file_path)) {
            return $cached['data'];
        }
    }

    $data = load_blog_post($category, $slug);
    $post_cache[$cache_key] = [
        'data' => $data,
        'mtime' => filemtime($file_path)
    ];

    return $data;
}
```

### Lazy Loading

**Related Posts**: Load only when needed (single post page)

**Category Lists**: Load only when needed (category/index pages)

**Image Data**: Load only featured images for listings, full image data for single posts

## Migration Data Transformation Script

### Script Requirements

**Input**: WordPress extracted JSON files
**Output**: PHP template JSON files

**Transformation Steps**:

1. Load WordPress data files
2. Extract and transform URLs
3. Transform image paths
4. Generate excerpts
5. Transform related posts
6. Validate data
7. Save to PHP template structure

**Script Location**: `scripts/blog/transform-to-template-format.py`

**See**: `scripts/blog/process-content-for-migration.py` for reference pattern

## Related Documentation

- [Migration Content Structure](MIGRATION_CONTENT_STRUCTURE.md) - Content structure requirements
- [Blog Template Best Practices](BLOG_TEMPLATE_BEST_PRACTICES.md) - Best practices including data structure
- [Template Patterns Analysis](TEMPLATE_PATTERNS_ANALYSIS.md) - Existing template patterns
- [Migration Architecture](MIGRATION_ARCHITECTURE.md) - Technical architecture
