# Hybrid OCR Architecture Design

**Last Updated:** 2026-01-20

## Overview

This document describes the hybrid OCR architecture that combines Google Cloud Vision API with OpenAI GPT-4 Vision API for optimal accuracy and cost balance.

## Architecture Goals

1. **Accuracy:** Achieve 99%+ accuracy for business card extraction
2. **Cost Efficiency:** Minimize API costs while maintaining high accuracy
3. **Performance:** Maintain < 4 seconds end-to-end processing time
4. **Reliability:** Graceful fallback if one API fails

## Architecture Options

### Option A: Selective Hybrid (Recommended)

**Flow:**
1. Process all cards with Google Vision API
2. Calculate confidence score
3. Route low-confidence cards (< 0.65) to OpenAI
4. Merge results intelligently
5. Return best result

**Cost:** ~$0.00235 per card (10% use OpenAI)
**Accuracy:** 97-99.5%
**Latency:** +1-2 seconds for OpenAI calls (10% of cards)

### Option B: Full Hybrid

**Flow:**
1. Process all cards with both APIs in parallel
2. Merge results using confidence scores
3. Return best result

**Cost:** ~$0.0115 per card (100% use both)
**Accuracy:** 99-99.5%
**Latency:** +2-3 seconds for all cards

### Option C: OpenAI Fallback Only

**Flow:**
1. Process all cards with Google Vision API
2. If confidence < threshold (< 0.40), call OpenAI
3. Merge results if OpenAI used
4. Return best result

**Cost:** ~$0.00193 per card (5% use OpenAI)
**Accuracy:** 96-98%
**Latency:** +1-2 seconds for OpenAI calls (5% of cards)

## Recommended Architecture: Selective Hybrid

### Routing Logic

```php
function shouldUseOpenAI($googleResult, $confidence) {
    // Route to OpenAI if:
    // 1. Overall confidence < 0.65
    // 2. Critical fields missing (email or phone)
    // 3. Complex layout detected
    // 4. Low field-level confidence for key fields
    
    $config = require __DIR__ . '/../config/confidence-thresholds.php';
    $thresholds = $config['overall'];
    
    if ($confidence < $thresholds['medium']) {
        return true; // Low confidence
    }
    
    // Check critical fields
    $criticalFields = ['email', 'phone'];
    foreach ($criticalFields as $field) {
        if (empty($googleResult[$field])) {
            return true; // Missing critical field
        }
    }
    
    // Check field-level confidence
    $fieldConfidences = $googleResult['field_confidences'] ?? [];
    foreach ($criticalFields as $field) {
        if (isset($fieldConfidences[$field]) && $fieldConfidences[$field] === 'low') {
            return true; // Low confidence for critical field
        }
    }
    
    return false; // Use Google Vision only
}
```

### Result Merging Strategy

```php
function mergeOCRResults($googleResult, $openaiResult, $googleConfidence, $openaiConfidence) {
    $merged = [];
    $fields = ['firstname', 'lastname', 'email', 'phone', 'company', 'jobtitle'];
    
    foreach ($fields as $field) {
        $googleValue = $googleResult[$field] ?? '';
        $openaiValue = $openaiResult[$field] ?? '';
        
        // Prefer value with higher confidence
        if (!empty($openaiValue) && $openaiConfidence > $googleConfidence) {
            $merged[$field] = $openaiValue;
        } elseif (!empty($googleValue)) {
            $merged[$field] = $googleValue;
        } else {
            $merged[$field] = '';
        }
    }
    
    return $merged;
}
```

## Implementation Components

### 1. OCR Router (`v2/helpers/ocr-router.php`)

**Responsibilities:**
- Determine which API(s) to use
- Route requests to appropriate API(s)
- Handle fallback logic

### 2. OpenAI Vision OCR (`v2/api/openai-vision-ocr.php`)

**Responsibilities:**
- Format image for OpenAI API
- Send request with optimized prompt
- Parse structured JSON response
- Handle errors and retries

### 3. Result Merger (`v2/helpers/ocr-result-merger.php`)

**Responsibilities:**
- Merge results from multiple APIs
- Use confidence scores for field selection
- Handle conflicts intelligently
- Return best result

### 4. Prompt Templates (`v2/config/openai-prompts.php`)

**Responsibilities:**
- Define structured extraction prompts
- Optimize for German business cards
- Handle edge cases

## Prompt Engineering

### Base Prompt Template

```
Extract structured information from this business card image. Return JSON with:
- firstname: First name
- lastname: Last name  
- email: Email address
- phone: Phone number in E.164 format (+[country][number])
- company: Company name
- jobtitle: Job title

The card is primarily in German. Handle umlauts (ä, ö, ü) and ß correctly.
Clean OCR errors (0/O, 1/l/I) in email addresses.
Normalize phone numbers to E.164 format.

Return only valid JSON, no additional text.
```

### Enhanced Prompt (with context)

```
You are extracting information from a German business card image.

Rules:
1. Names: Proper case (Max Mustermann, not MAX MUSTERMANN)
2. Email: Lowercase, fix OCR errors (@ not ©, . not ,)
3. Phone: E.164 format (+493012345678)
4. Company: Include legal form if present (GmbH, AG, etc.)
5. Job Title: Full title, not abbreviated

Return JSON:
{
  "firstname": "...",
  "lastname": "...",
  "email": "...",
  "phone": "...",
  "company": "...",
  "jobtitle": "..."
}
```

## Error Handling

### API Failures

1. **Google Vision API fails:**
   - Fallback to OpenAI only
   - Log error for monitoring
   - Return OpenAI result

2. **OpenAI API fails:**
   - Use Google Vision result only
   - Log error for monitoring
   - Return Google result

3. **Both APIs fail:**
   - Return error to user
   - Log critical error
   - Suggest manual entry

### Rate Limiting

1. **Google Vision:** Implement exponential backoff
2. **OpenAI:** Implement rate limit handling
3. **Queue system:** For high-volume scenarios

## Cost Management

### Budget Controls

1. **Daily limit:** Max $X per day
2. **Monthly limit:** Max $Y per month
3. **Per-card limit:** Max cost per card
4. **Auto-disable:** Disable OpenAI if budget exceeded

### Cost Tracking

Track:
- API calls per provider
- Cost per card
- Cost per accuracy percentage
- Monthly totals

## Performance Optimization

### Parallel Processing

- Process Google and OpenAI in parallel (if both needed)
- Use async requests where possible
- Cache results for duplicate images

### Caching Strategy

- Cache Google Vision results (same image = same result)
- Cache OpenAI results (expensive, cache aggressively)
- TTL: 24 hours for business cards

## Testing Strategy

### Test Cases

1. **High-confidence Google results:** Should not use OpenAI
2. **Low-confidence Google results:** Should use OpenAI
3. **Missing critical fields:** Should use OpenAI
4. **API failures:** Should fallback gracefully
5. **Cost limits:** Should disable OpenAI when limit reached

### Accuracy Testing

- Test with 50-100 sample cards
- Compare Google-only vs Hybrid accuracy
- Measure cost per accuracy percentage
- Validate ROI

## Monitoring & Alerts

### Metrics to Track

1. **Accuracy:** Per-field and overall
2. **Cost:** Per API, per card, monthly totals
3. **Latency:** Processing time per API
4. **Usage:** Cards routed to OpenAI vs Google-only
5. **Errors:** API failures, rate limits

### Alerts

1. **Cost alerts:** 80% of monthly budget
2. **Accuracy drops:** Below 95% threshold
3. **API failures:** > 5% failure rate
4. **Latency spikes:** > 5 seconds average

## Deployment Plan

### Phase 1: Proof of Concept (Week 1-2)

1. Implement OpenAI Vision OCR endpoint
2. Create basic routing logic
3. Test with 10-20 sample cards
4. Measure accuracy improvement

### Phase 2: Integration (Week 3-4)

1. Integrate with existing OCR flow
2. Implement result merging
3. Add cost tracking
4. Set up monitoring

### Phase 3: Optimization (Week 5-6)

1. Fine-tune routing logic
2. Optimize prompts
3. Implement caching
4. Performance tuning

### Phase 4: Production (Week 7-8)

1. Deploy to production
2. Monitor metrics
3. Adjust thresholds
4. Document learnings

## Success Criteria

1. **Accuracy:** 99%+ overall accuracy
2. **Cost:** < $0.005 per card average
3. **Latency:** < 4 seconds end-to-end
4. **Reliability:** < 1% failure rate
5. **ROI:** Positive ROI vs manual correction

## Next Steps

1. **Review and approve architecture**
2. **Set up OpenAI API access**
3. **Implement proof of concept**
4. **Test with sample cards**
5. **Deploy selectively (feature flag)**
