# Salutation Mapping System

**Last Updated:** 2026-01-20

## Overview

The salutation mapping system extracts German honorifics (Herr, Frau, Divers) from OCR text and accurately maps them to the Anrede dropdown options in the event form using enhanced pattern matching and fuzzy matching.

## Anrede Dropdown Options

1. **Herr** (Mr.)
2. **Frau** (Mrs./Ms.)
3. **Divers** (Diverse/Non-binary)

## Architecture

### Salutation Extraction & Mapping Flow

```
OCR Text (Name Line)
    ↓
[Step 1] Extract Salutation (Server-side, PHP)
    ├─ Pattern matching: "Herr", "Frau", "Divers" (with variants)
    ├─ Handle OCR errors: "Herrn", "Fraü", "Div."
    ├─ Extract BEFORE removing from name
    ├─ Handle combinations: "Herr Dr.", "Frau Prof."
    └─ Return salutation in OCR response
    ↓
[Step 2] Map to Dropdown (Client-side, JavaScript)
    ├─ Keyword dictionary matching (highest priority)
    ├─ Exact match (case-insensitive)
    ├─ Contains match (partial)
    ├─ Fuzzy similarity (Levenshtein distance)
    ├─ Confidence >= 0.75 → Select option
    └─ Confidence < 0.75 → Leave empty (don't auto-select)
```

## Components

### 1. Salutation Extraction (`v2/api/ocr-business-card.php`)

**Location:** `extractSalutationFromLine()` function

**Features:**
- Pattern matching for gender salutations (Herr, Frau, Divers)
- Handles variants: "Herrn", "Fraü", "Div."
- OCR error correction: "Hern" → "Herr", "Diverss" → "Divers"
- Handles combinations: "Herr Dr.", "Frau Prof." (prioritizes gender salutation)
- Edge case handling: "Herr/Frau" → empty (ambiguous), "Sehr geehrter Herr" → "Herr"

**Patterns:**
- Primary patterns: `/^(Herr|Herrn|Frau|Fraü|Divers|Div\.?)[\s\.]/i`
- OCR error variants: `/\b(Hern|Frau|Diverss|Divets)\b/i`
- Combinations: Handles "Herr Dr.", "Frau Prof." by checking gender salutation first

**Return Value:**
- `'Herr'`, `'Frau'`, `'Divers'`, or empty string `''`

### 2. Name Extraction Integration (`v2/api/ocr-business-card.php`)

**Location:** `extractNameFromLine()` function

**Changes:**
- Calls `extractSalutationFromLine()` FIRST (before removing from name)
- Returns salutation in result array: `['firstname' => ..., 'lastname' => ..., 'salutation' => ...]`
- Removes salutation from name AFTER extraction

### 3. Enhanced Fuzzy Matching (`v2/js/event-form.js`)

**Location:** `matchSalutationToDropdown()` method

**Features:**
- Keyword dictionary matching (highest priority, confidence: 0.95-1.0)
- Exact match (case-insensitive, confidence: 1.0)
- Contains match (partial, confidence: 0.85)
- Fuzzy similarity using Levenshtein distance (confidence: 0.75+)
- German-specific normalization

**Keyword Dictionaries:**
```javascript
SALUTATION_KEYWORDS = {
    'Herr': ['herr', 'herrn', 'herr.', 'hr.', 'hr', 'hern'],
    'Frau': ['frau', 'fraü', 'frau.', 'fr.', 'fr', 'fräulein'],
    'Divers': ['divers', 'div.', 'divers.', 'diverss', 'div', 'divets']
};
```

**Normalization:**
- Case normalization (to lowercase)
- Punctuation removal ("Herr." → "herr")
- OCR error correction ("Fraü" → "frau", "Diverss" → "divers")

### 4. Auto-Fill Integration (`v2/js/event-form.js`)

**Location:** `autoFillForm()` method

**Flow:**
1. Check for `ocrData.salutation`
2. Call `matchSalutationToDropdown()`
3. If confidence >= 0.75 → Select matched option
4. If confidence < 0.75 → Leave empty (don't auto-select)
5. Update `fieldsFilled` counter

## Configuration

### Salutation Patterns (`v2/config/ocr-patterns.php`)

**Salutation Variants:**
```php
'salutations' => [
    // Male
    'Herr' => 'Herr',
    'Herrn' => 'Herr', // Dative form
    'Herr.' => 'Herr',
    'Hr.' => 'Herr',
    'Hern' => 'Herr', // OCR error
    
    // Female
    'Frau' => 'Frau',
    'Frau.' => 'Frau',
    'Fr.' => 'Frau',
    'Fraü' => 'Frau', // OCR error
    'Fräulein' => 'Frau', // Outdated, map to Frau
    
    // Non-binary
    'Divers' => 'Divers',
    'Div.' => 'Divers',
    'Diverss' => 'Divers', // OCR error
    'Divets' => 'Divers' // OCR error
]
```

**Regex Patterns:**
```php
'salutation_patterns' => [
    '/\b(Herr|Herrn|Frau|Fraü|Divers|Div\.?)\b\.?/i',
    '/^(Herr|Herrn|Frau|Fraü|Divers|Div\.?)[\s\.]/i',
    '/\b(Hern|Frau|Diverss|Divets)\b/i'
]
```

## OCR Response Structure

**Before:**
```json
{
  "success": true,
  "data": {
    "firstname": "Max",
    "lastname": "Mustermann",
    "email": "max@example.com",
    "phone": "+493012345678",
    "company": "Example GmbH",
    "jobtitle": "Geschäftsführer"
  }
}
```

**After:**
```json
{
  "success": true,
  "data": {
    "firstname": "Max",
    "lastname": "Mustermann",
    "email": "max@example.com",
    "phone": "+493012345678",
    "company": "Example GmbH",
    "jobtitle": "Geschäftsführer",
    "salutation": "Herr"
  }
}
```

## Edge Cases & Special Handling

### 1. Academic Titles
- **"Herr Dr. Müller"** → Extract "Herr" (not "Dr.")
- **"Frau Prof. Schmidt"** → Extract "Frau" (not "Prof.")
- **Priority:** Gender salutation > Academic title

### 2. OCR Errors
- **"Hern"** → Map to "Herr" (fuzzy match)
- **"Fraü"** → Map to "Frau" (keyword dictionary)
- **"Diverss"** → Map to "Divers" (keyword dictionary)

### 3. Missing Salutation
- If no salutation extracted → Leave dropdown empty
- Don't guess based on first name (unreliable)
- User can select manually

### 4. Ambiguous Cases
- **"Herr/Frau"** → Leave empty (don't auto-select)
- **"Sehr geehrter Herr"** → Extract "Herr"

### 5. Non-German Salutations
- **"Mr."** → Could map to "Herr" (if needed in future)
- **"Mrs." / "Ms."** → Could map to "Frau" (if needed in future)
- **"Mx."** → Could map to "Divers" (if needed in future)
- **Current focus:** German patterns only

## Performance Considerations

- **Extraction:** < 5ms (pattern matching, no API calls)
- **Mapping:** < 10ms (client-side fuzzy matching)
- **No API costs:** Pure pattern matching, no external services needed
- **Simple logic:** Only 3 options, simpler than job title mapping

## Testing

### Test Suite (`v2/scripts/test-salutation-mapping.php`)

**Test Coverage:**
- Exact matches: "Herr", "Frau", "Divers"
- Variants: "Herrn", "Fraü", "Div."
- OCR errors: "Hern", "Fraü", "Diverss"
- Combinations: "Herr Dr.", "Frau Prof."
- Edge cases: "Herr/Frau", "Sehr geehrter Herr", missing salutation
- Case variations: Uppercase, lowercase, mixed

**Current Accuracy:**
- Extraction Tests: 19/19 passed (100.0%)
- Name Extraction Tests: 3/4 passed (75.0%)
- Overall: 22/23 passed (95.7%)

## Success Criteria

1. ✅ Salutation extracted from OCR text accurately
2. ✅ Salutation appears in OCR response structure
3. ✅ Accurate mapping to dropdown options
4. ✅ Handles OCR errors and variants
5. ✅ No false positives (doesn't auto-select incorrectly)
6. ✅ Comprehensive test coverage
7. ✅ Documentation complete

## Files Modified

**Server-Side (PHP):**
- `v2/api/ocr-business-card.php` - Added `extractSalutationFromLine()`, updated `extractNameFromLine()`, updated parsing functions
- `v2/config/ocr-patterns.php` - Added salutation patterns and variants

**Client-Side (JavaScript):**
- `v2/js/event-form.js` - Added `normalizeSalutation()`, `matchSalutationToDropdown()`, integrated with `autoFillForm()`

**Testing:**
- `v2/scripts/test-salutation-mapping.php` - Comprehensive test suite

**Documentation:**
- `docs/systems/ocr/SALUTATION_MAPPING.md` - This file

## Error Handling

### Error Response Structure

All error responses include salutation field for consistency:

```json
{
  "success": false,
  "message": "Error message",
  "data": {
    "firstname": "",
    "lastname": "",
    "email": "",
    "phone": "",
    "company": "",
    "jobtitle": "",
    "salutation": ""
  }
}
```

### Validation Errors

Invalid salutation values are removed during validation:
- Values not in ['Herr', 'Frau', 'Divers'] → Removed (set to empty string)
- Case normalization: "herr" → "Herr", "FRAU" → "Frau"
- Invalid characters → Removed

### Missing Dropdown Handling

JavaScript gracefully handles missing salutation dropdown:
- Logs warning if dropdown not found
- Continues with other fields
- Doesn't block auto-fill process

## OpenAI Vision OCR Integration

### Salutation Extraction

OpenAI Vision OCR extracts salutation as part of structured JSON response:

**Prompt Requirements:**
- Prompts request salutation extraction
- Salutation rules documented in prompt
- Returns empty string if not found

**Response Structure:**
```json
{
  "firstname": "Max",
  "lastname": "Mustermann",
  "email": "max@example.com",
  "phone": "+493012345678",
  "company": "Example GmbH",
  "jobtitle": "Geschäftsführer",
  "salutation": "Herr"
}
```

**Validation:**
- Normalizes case (ucfirst)
- Validates against allowed values
- Removes invalid salutations

**Comparison with Google Vision:**
- Both extract salutation consistently
- OpenAI may handle complex layouts better
- Google Vision faster and cheaper
- Hybrid approach uses best result

## Troubleshooting

### Salutation Not Extracted

**Possible Causes:**
1. Salutation not present on business card
2. OCR error misread salutation
3. Ambiguous case ("Herr/Frau") → Intentionally left empty
4. Pattern not matching variant

**Solutions:**
- Check OCR text output for salutation presence
- Review extraction patterns in `v2/config/ocr-patterns.php`
- Verify ambiguous case handling
- Check logs for extraction attempts

### Salutation Not Auto-Filled

**Possible Causes:**
1. Confidence below threshold (< 0.75)
2. Dropdown not found in DOM
3. Matching function failed
4. Invalid salutation value

**Solutions:**
- Check browser console for debug logs
- Verify dropdown exists: `.custom-dropdown[data-field="salutation"]`
- Review matching confidence scores
- Check salutation value in OCR response

### Invalid Salutation Values

**Possible Causes:**
1. OCR misread salutation
2. Non-German salutation ("Mr.", "Ms.")
3. Validation too strict

**Solutions:**
- Add variant to patterns if common OCR error
- Expand allowed values if needed
- Review validation logic

## Related Documentation

- [Job Title Mapping](./JOB_TITLE_MAPPING.md) - Similar architecture for job title mapping
- [Field Extraction Guide](./FIELD_EXTRACTION_GUIDE.md) - General OCR field extraction patterns
- [Implementation Summary](./IMPLEMENTATION_SUMMARY.md) - Overall OCR system status

## Legal Context (Germany)

**German Salutation Best Practices (2024-2025):**
- "Divers" is legally recognized in Germany (Self-Determination Act, Nov 2024)
- Court ruling (OLG Frankfurt, Jan 2023): Requiring only "Herr"/"Frau" is discriminatory
- Common forms: "Herr", "Frau", "Divers", "Herr/Frau" (ambiguous)
- Avoid outdated forms: "Fräulein" (use "Frau" instead)
- Non-binary preference: Many prefer no honorific or neutral approach
