# OpenAI Vision API Integration

**Last Updated:** 2026-01-20

## Overview

The OCR business card system integrates OpenAI GPT-4 Vision API to improve extraction accuracy, particularly for job titles and complex layouts. OpenAI Vision API results are prioritized over static parsing logic.

## Configuration

### Enable/Disable

OpenAI integration can be enabled via environment variable:

```bash
export OPENAI_OCR_ENABLED=true
```

Or modify `v2/config/openai-config.php`:

```php
define('OPENAI_OCR_ENABLED', true);
```

### API Key Setup

Set your OpenAI API key via environment variable:

```bash
export OPENAI_API_KEY=sk-...
```

Or configure in `v2/config/openai-config.php`:

```php
'api_key' => getenv('OPENAI_API_KEY') ?: 'your-api-key-here',
```

### Configuration Options

File: `v2/config/openai-config.php`

```php
return [
    'api_key' => getenv('OPENAI_API_KEY') ?: '',
    'api_url' => 'https://api.openai.com/v1/chat/completions',
    'model' => 'gpt-4o', // Recommended for OCR
    'timeout' => 30, // seconds
    'max_retries' => 2,
    
    // Cost limits
    'daily_limit' => 100, // Max API calls per day
    'monthly_limit' => 2000, // Max API calls per month
    
    // Routing thresholds
    'confidence_threshold' => 0.65, // Use OpenAI if Google confidence < this
    'require_critical_fields' => true, // Require email/phone for routing
];
```

## Integration Flow

### Processing Order

1. **Google Vision API** - Primary OCR extraction
2. **OpenAI Vision API** (if enabled) - AI-powered field extraction
3. **Static Parsing Strategies** - Fallback parsing
4. **Result Merging** - Prioritize API results

### Error Handling

- OpenAI API failures are logged but don't break the OCR process
- Automatic fallback to static parsing if OpenAI fails
- Error details logged for debugging

### Result Prioritization

OpenAI results are prioritized in merging:

- **Weight:** 1.0 (highest priority)
- **Job Title Boost:** ×1.2 for job title field
- **Selection:** OpenAI results selected even if static parsing has slightly higher confidence

## Prompts

### Enhanced Prompt

The system uses an enhanced prompt that explicitly excludes company names and addresses from job titles:

```
Job Title Rules (CRITICAL):
- Extract ONLY the professional role/title (e.g., "Designer", "Manager", "CEO")
- DO NOT include company name, address, or contact information
- If job title is not clearly visible, return empty string ""
- Examples:
  ✓ Correct: "Designer", "Sales Manager", "Geschäftsführer"
  ✗ Incorrect: "Emily Bates Agency", "123 Main Street", "Designer at Company"
```

See `v2/config/openai-prompts.php` for full prompt templates.

## Post-Processing Validation

OpenAI results are validated before use:

- **Job Title Validation:** Checks for company name and address contamination
- **Email Validation:** Validates email format
- **Salutation Validation:** Normalizes and validates German salutations

See `validateJobTitleFromOpenAI()` in `v2/api/openai-vision-ocr.php` for validation logic.

## Performance Impact

### Latency

- **OpenAI API Call:** ~1-2 seconds (depending on image size and API response time)
- **Total Processing Time:** Adds ~1-2 seconds to OCR processing
- **Timeout:** 30 seconds (configurable)

### Cost Considerations

- **Model:** GPT-4o (recommended for OCR)
- **Cost per Request:** ~$0.01-0.02 per business card (depending on image size)
- **Daily Limit:** 100 requests (configurable)
- **Monthly Limit:** 2000 requests (configurable)

### Optimization

- OpenAI API is only called if enabled and API key is configured
- Failures gracefully fall back to static parsing
- Results are cached in memory during processing (not persisted)

## Monitoring

### Logging

OpenAI API calls are logged with:

- Processing time
- Confidence scores
- Fields extracted
- Success/failure status
- Error details (on failure)

### Metrics Tracked

- OpenAI API success rate
- Processing time
- Job title extraction accuracy
- Strategy comparison (OpenAI vs static parsing)

## Troubleshooting

### OpenAI API Not Called

**Symptoms:** No OpenAI results in output, only static parsing

**Possible Causes:**
1. `OPENAI_OCR_ENABLED` not set to `true`
2. API key not configured
3. API key invalid or expired

**Solutions:**
1. Check `v2/config/openai-config.php` or environment variable
2. Verify API key is set: `echo $OPENAI_API_KEY`
3. Test API key: `curl https://api.openai.com/v1/models -H "Authorization: Bearer $OPENAI_API_KEY"`

### OpenAI API Failures

**Symptoms:** Errors in logs, fallback to static parsing

**Possible Causes:**
1. API rate limits exceeded
2. Invalid image format
3. Network timeout
4. API key permissions

**Solutions:**
1. Check rate limits in OpenAI dashboard
2. Verify image format (JPEG, PNG, WebP)
3. Increase timeout in config if needed
4. Verify API key has vision access

### Job Title Still Contaminated

**Symptoms:** Job titles contain company names or addresses

**Possible Causes:**
1. OpenAI prompt not followed correctly
2. Validation not catching all patterns
3. Exclusion patterns incomplete

**Solutions:**
1. Review OpenAI response in logs (debug mode)
2. Add missing patterns to `v2/config/ocr-patterns.php`
3. Enhance validation logic in `validateJobTitleFromOpenAI()`

## Best Practices

1. **Enable Gradually:** Start with low volume to test accuracy
2. **Monitor Costs:** Track API usage and costs
3. **Review Logs:** Check OpenAI results vs static parsing differences
4. **Update Prompts:** Refine prompts based on real-world results
5. **Maintain Fallback:** Always ensure static parsing works as fallback

## Related Documentation

- [Field Extraction Guide](FIELD_EXTRACTION_GUIDE.md) - Complete field extraction documentation
- [Implementation Summary](IMPLEMENTATION_SUMMARY.md) - Overall OCR system status
- [Configuration Guide](CONFIGURATION.md) - OCR configuration options
