# Job Title Mapping System

**Last Updated:** 2026-01-20

## Overview

The job title mapping system matches OCR-extracted job titles to the Position dropdown options in the event form using a hybrid approach: enhanced fuzzy matching (client-side) + OpenAI embeddings API (server-side semantic similarity).

## Position Dropdown Options

1. **Inhaber / Geschäftsführer** (Owner / Managing Director)
2. **HR / Schichtleitung** (HR / Shift Management)
3. **Mitarbeiter** (Employee)
4. **Sonstiges** (Other)

## Architecture

### Hybrid Matching Flow

```
OCR Extracted Job Title
    ↓
[Step 1] Enhanced Fuzzy Matching (Client-side, JavaScript)
    ├─ Keyword dictionary match → Select option (confidence: 0.95-1.0)
    ├─ Exact match → Select option (confidence: 1.0)
    ├─ Contains match → Select option (confidence: 0.7-0.9)
    ├─ Fuzzy similarity > 0.85 → Select option
    └─ No match OR confidence < 0.85 → Continue to Step 2
    ↓
[Step 2] OpenAI Embeddings API (Server-side, PHP)
    ├─ Generate embedding for extracted title
    ├─ Get/cache embeddings for dropdown options
    ├─ Calculate cosine similarity for each option
    ├─ Best match similarity > 0.75 → Select option
    └─ Best match similarity < 0.75 → Select "Sonstiges"
```

## Components

### 1. Enhanced Fuzzy Matching (`v2/js/event-form.js`)

**Location:** `matchJobTitleToDropdown()` method

**Features:**
- Keyword dictionary matching (highest priority)
- Exact match (case-insensitive, normalized)
- Contains match (bidirectional)
- Fuzzy similarity using Levenshtein distance
- German-specific normalization

**Keyword Dictionaries:**

```javascript
POSITION_KEYWORDS = {
    'Inhaber / Geschäftsführer': [
        'inhaber', 'geschäftsführer', 'geschäftsführerin', 'owner', 
        'ceo', 'chief executive officer', 'cto', 'cfo',
        'vorstand', 'vorstandsvorsitzender', 'geschäftsinhaber', 'prokurist',
        'managing director', 'geschäftsleitung', 'unternehmensleitung',
        'restaurantbesitzer', 'unternehmer', 'firmeninhaber'
    ],
    'HR / Schichtleitung': [
        'hr', 'human resources', 'personal', 'schichtleitung', 'schichtplanung',
        'personalleiter', 'hr manager', 'hr leiter', 'personalmanager',
        'shift management', 'schichtplaner', 'disponent', 'disposition',
        'schichtkoordinator', 'planer', 'personalwesen'
    ],
    'Mitarbeiter': [
        'mitarbeiter', 'mitarbeiterin', 'angestellter', 'angestellte', 
        'employee', 'staff', 'worker', 'kellner', 'koch', 'servicekraft',
        'verkäufer', 'kassierer', 'aushilfe', 'hilfskraft'
    ]
};
```

**Normalization:**

The `normalizeJobTitle()` helper function:
- Expands abbreviations (CEO → Chief Executive Officer, GF → Geschäftsführer)
- Normalizes German characters (ä→ae, ö→oe, ü→ue, ß→ss)
- Removes common prefixes/suffixes
- Handles compound titles

**Confidence Scoring:**

- Keyword match: 0.95-1.0
- Exact match: 1.0
- Contains match: 0.7-0.9 (proportional to overlap)
- Fuzzy similarity: 0.0-1.0 (Levenshtein distance)

### 2. OpenAI Embeddings API (`v2/api/job-title-matcher.php`)

**Endpoint:** `POST /v2/api/job-title-matcher.php`

**Input:**
```json
{
  "job_title": "CEO",
  "options": ["Inhaber / Geschäftsführer", "HR / Schichtleitung", "Mitarbeiter", "Sonstiges"]
}
```

**Output:**
```json
{
  "success": true,
  "match": "Inhaber / Geschäftsführer",
  "confidence": 0.87,
  "similarities": {
    "Inhaber / Geschäftsführer": 0.87,
    "HR / Schichtleitung": 0.23,
    "Mitarbeiter": 0.15,
    "Sonstiges": 0.12
  },
  "method": "embeddings",
  "threshold": 0.75
}
```

**Features:**
- Uses OpenAI `text-embedding-3-small` model
- Caches dropdown option embeddings (they never change)
- Calculates cosine similarity for each option
- Returns best match with confidence score

**Model Details:**
- Model: `text-embedding-3-small`
- Dimensions: 1536
- Cost: ~$0.02 per 1M tokens (very cheap)
- Quality: Good for semantic similarity tasks

### 3. JavaScript API Client (`v2/js/job-title-matcher.js`)

**Function:** `matchJobTitleSemantic(jobTitle, options, config)`

**Features:**
- Calls PHP endpoint via fetch
- Handles errors gracefully (fallback to fuzzy)
- Request timeout: 5 seconds
- Returns promise with match result

**Usage:**
```javascript
const result = await matchJobTitleSemantic('CEO', [
    'Inhaber / Geschäftsführer',
    'HR / Schichtleitung',
    'Mitarbeiter',
    'Sonstiges'
]);
```

### 4. Integration (`v2/js/event-form.js`)

**Flow:**
1. Try fuzzy matching first
2. If fuzzy confidence < 0.85 OR no match, call semantic API
3. If semantic similarity > 0.75, select best match
4. If semantic similarity < 0.75, select "Sonstiges"
5. Log matching method used for debugging

## Configuration

### OpenAI Embeddings Config (`v2/config/openai-embeddings.php`)

```php
return [
    'enabled' => true,
    'api_key' => getenv('OPENAI_API_KEY') ?: '',
    'model' => 'text-embedding-3-small',
    'dimensions' => 1536,
    'similarity_threshold' => 0.75, // Minimum cosine similarity for match
    'timeout' => 10,
    'cache_enabled' => true,
    'cache_ttl' => 86400, // 24 hours
    'fallback_to_fuzzy' => true,
    'auto_select_sonstiges' => true
];
```

**Similarity Thresholds:**
- **0.75+**: Good match, select option
- **< 0.75**: No good match, select "Sonstiges" (if enabled)

## Matching Examples

### Exact Matches
- "Inhaber / Geschäftsführer" → "Inhaber / Geschäftsführer" (confidence: 1.0)
- "HR / Schichtleitung" → "HR / Schichtleitung" (confidence: 1.0)

### Synonym Matches
- "CEO" → "Inhaber / Geschäftsführer" (semantic similarity: ~0.85)
- "Geschäftsführer" → "Inhaber / Geschäftsführer" (keyword match: 0.95)
- "Owner" → "Inhaber / Geschäftsführer" (semantic similarity: ~0.80)

### HR Matches
- "HR Manager" → "HR / Schichtleitung" (semantic similarity: ~0.82)
- "Personalleiter" → "HR / Schichtleitung" (keyword match: 0.95)
- "Schichtplaner" → "HR / Schichtleitung" (keyword match: 0.95)

### Employee Matches
- "Kellner" → "Mitarbeiter" (keyword match: 0.95)
- "Employee" → "Mitarbeiter" (semantic similarity: ~0.78)
- "Servicekraft" → "Mitarbeiter" (keyword match: 0.95)

### Unmatched Titles
- "Software Engineer" → "Sonstiges" (semantic similarity: ~0.35)
- "Marketing Manager" → "Sonstiges" (semantic similarity: ~0.42)
- "Consultant" → "Sonstiges" (semantic similarity: ~0.38)

## Performance

- **Fuzzy Matching:** < 10ms (client-side, no API call)
- **Semantic Matching:** < 1 second (with cached option embeddings)
- **Cost:** ~$0.00002 per matching request (very cheap)

## Error Handling

1. **OpenAI API Failure:**
   - Fallback to enhanced fuzzy matching
   - Log error for monitoring
   - Don't block form auto-fill

2. **Timeout (> 5 seconds):**
   - Cancel request
   - Use fuzzy matching result
   - Log timeout for monitoring

3. **Invalid Response:**
   - Validate response structure
   - Fallback to fuzzy matching
   - Log validation error

## Caching

Dropdown option embeddings are cached in `v2/cache/job-title-embeddings.json`:
- Cache TTL: 24 hours
- Pre-generated on first run
- Used for all subsequent requests
- Significantly reduces API calls and latency

## Testing

Run the test suite:
```bash
php v2/scripts/test-job-title-matching.php
```

Test cases cover:
- Exact matches
- Synonym matches
- German variations
- Abbreviations
- English titles
- Unmatched titles
- Edge cases

## Troubleshooting

### Semantic Matching Not Working

1. **Check API Key:**
   ```bash
   echo $OPENAI_API_KEY
   ```

2. **Check Configuration:**
   - Verify `OPENAI_EMBEDDINGS_ENABLED` is `true`
   - Verify API key is set in config

3. **Check Cache:**
   - Verify `v2/cache/job-title-embeddings.json` exists
   - Check cache file permissions

4. **Check Logs:**
   - Review `v2/logs/ocr-business-card-*.log`
   - Look for embedding generation errors

### Low Accuracy

1. **Adjust Similarity Threshold:**
   - Lower threshold (e.g., 0.70) for more matches
   - Higher threshold (e.g., 0.80) for stricter matching

2. **Expand Keyword Dictionaries:**
   - Add more keywords to `POSITION_KEYWORDS` in `event-form.js`

3. **Review Test Results:**
   - Run test suite and review accuracy report
   - Identify patterns in failures

## Related Documentation

- [OCR Implementation Summary](../ocr/IMPLEMENTATION_SUMMARY.md)
- [OpenAI Integration Guide](../ocr/HYBRID_ARCHITECTURE.MD)
- [Event Form Implementation](../../../systems/forms/EVENT_FORM_IMPLEMENTATION.md)
