# OCR Developer Guide

**Last Updated:** 2026-01-28

## Overview

This guide helps developers understand, maintain, and extend the OCR system.

## Architecture

### Core Components

1. **OCR API** (`v2/api/ocr-business-card.php`)
   - Main endpoint for processing business cards
   - Integrates preprocessing, Vision API, parsing strategies
   - Returns structured data

2. **Preprocessing** (`v2/helpers/image-preprocessor.php`)
   - Image enhancement before OCR
   - Supports GD and Imagick

3. **Parsing Strategies**
   - Structured parsing (uses Vision API bounding boxes)
   - Line-by-line parsing (fallback)
   - Pattern-based parsing (independent field extraction)

4. **Validation** (`validateAndCleanOCRData()`)
   - Centralized data validation and cleaning
   - Email format, phone normalization, name casing

5. **Router** (`v2/helpers/ocr-router.php`)
   - Routes to Google Vision or OpenAI based on confidence
   - Implements hybrid architecture

6. **Frontend JavaScript** (`v2/js/business-card-scanner.js`, `v2/js/event-form.js`)
   - Camera access and image capture
   - OCR button visibility management
   - Form auto-fill after OCR extraction
   - Scanner initialization and lifecycle management

## Adding New Patterns

Edit `v2/config/ocr-patterns.php`:

```php
return [
    'legal_forms' => [
        'GmbH', 'AG', 'UG', // Add new forms here
    ],
    'phone_patterns' => [
        // Add new phone patterns here
    ],
    // ...
];
```

## Adding New Parsing Strategy

1. Create function in `v2/api/ocr-business-card.php`:

```php
function parseNewStrategyOCR($text, $fullTextAnnotation) {
    $data = [
        'firstname' => '',
        'lastname' => '',
        // ... other fields
    ];

    // Your parsing logic

    return $data;
}
```

2. Add to parsing strategies array:

```php
$parsingResults[] = parseNewStrategyOCR($text, $fullTextAnnotation);
$strategiesUsed[] = 'new-strategy';
```

3. Calculate confidence:

```php
$confidenceScores[] = calculateParsingConfidence($parsingResults[count($parsingResults)-1], $text);
```

## Testing

### Unit Tests

```bash
php v2/scripts/test-ocr-parsing.php
```

### Accuracy Tests

```bash
php v2/scripts/test-ocr-accuracy-comprehensive.php
```

### Performance Tests

Check processing times in logs or use monitoring dashboard.

## Debugging

### Enable Debug Logging

Set in `v2/helpers/logger.php` or use query parameter:

```
?v2/api/ocr-business-card.php?debug=1
```

### Check Logs

```bash
tail -f v2/logs/ocr-business-card-*.log
```

### Analyze Strategy Performance

```bash
php v2/scripts/analyze-strategy-performance.php --days=7
```

### Analyze Error Patterns

```bash
php v2/scripts/analyze-error-patterns.php --days=7
```

## Performance Optimization

1. **Enable preprocessing:** Improves accuracy with minimal overhead
2. **Use caching:** Cache OCR results for duplicate images
3. **Optimize images:** Reduce size before sending to API
4. **Monitor costs:** Track API usage and costs

## Frontend Integration

### BusinessCardScanner Class

The `BusinessCardScanner` class (`v2/js/business-card-scanner.js`) handles camera access, image capture, and OCR processing on the frontend. It supports **auto-capture** when the card is in focus and stable (Laplacian sharpness detection + stability delay; no OpenCV). Tuning constants and optional debug: see EVENT_FORM_IMPLEMENTATION.md "Auto-capture" section.

**Key Methods:**

- `init()` - Initializes scanner, sets up event listeners, checks camera support
- `ensureInitialized()` - Ensures scanner is initialized (re-initializes if button wasn't found initially)
- `openCamera()` - Opens camera modal and requests camera permission
- `captureImage()` - Captures image from camera and sends to OCR API
- `handleOCRSuccess()` - Processes OCR results and auto-fills form
- `checkCameraSupport()` - Verifies camera availability

**Integration with EventForm:**

The scanner integrates with `EventForm` class (`v2/js/event-form.js`) for visibility management:

- `ensureCameraSectionVisible()` - Ensures OCR button is visible when form is displayed
- `isCameraSectionVisible()` - Checks if camera section is currently visible
- Auto-fills form fields after successful OCR extraction

**Form Scoping Requirements:**

**CRITICAL:** When multiple forms exist on a page (e.g., event form and demo booking modal), always scope field selections to the specific form container to prevent ID collisions.

**Best Practices:**

1. **Use Scoped Selectors:**

   ```javascript
   // ✅ CORRECT: Scope to event form
   const field = this.form.querySelector(`#${fieldId}`);

   // ❌ WRONG: Searches entire document (may find wrong form)
   const field = document.getElementById(fieldId);
   ```

2. **Use Helper Method:**

   ```javascript
   // Use getEventFormField() helper method
   const field = this.getEventFormField("company");
   ```

3. **Verify Form Exists:**

   ```javascript
   if (!this.form) {
     console.warn("Event form not found");
     return;
   }
   ```

4. **Check Form Visibility:**
   ```javascript
   const computedStyle = window.getComputedStyle(this.formContainer);
   if (computedStyle.display === "none") {
     this.showForm(); // Ensure form is visible
   }
   ```

**Why Scoped Selectors Are Necessary:**

- Multiple forms may share the same field IDs (e.g., `#company`, `#email`)
- `document.getElementById()` returns the FIRST element with that ID in DOM order
- Demo booking modal form may appear before event form in DOM
- Scoped selectors ensure OCR data fills the correct form

**Initialization Flow:**

1. Scanner constructor runs on DOM ready
2. Checks if scan button exists (may be hidden initially)
3. If button not found, sets `buttonNotFoundInitially = true`
4. `init()` method sets up event listeners and checks camera support
5. `ensureInitialized()` can be called later to re-initialize if button becomes available

**Visibility Management:**

- Camera scan section starts with `display: none` in HTML
- `ensureCameraSectionVisible()` called when form is displayed
- Checks camera support before showing button
- Hides button if camera not supported
- Uses `!important` flags to override CSS

## Common Issues

### Low Accuracy

1. Check image quality (resolution, contrast)
2. Enable preprocessing
3. Review error patterns: `php v2/scripts/analyze-error-patterns.php`
4. Consider OpenAI integration for difficult cases

### Slow Processing

1. Check preprocessing settings
2. Monitor API response times
3. Consider caching
4. Optimize image sizes

### API Errors

1. Check API key configuration
2. Verify API is enabled in GCP
3. Check billing status
4. Review error logs

## Related Documentation

- [Implementation Summary](IMPLEMENTATION_SUMMARY.md)
- [Field Extraction Guide](FIELD_EXTRACTION_GUIDE.md)
- [Preprocessing Guide](PREPROCESSING_GUIDE.md)
- [Cost Analysis](COST_ANALYSIS.md)
- [Hybrid Architecture](HYBRID_ARCHITECTURE.md)
- [Troubleshooting Guide](TROUBLESHOOTING.md) - Includes frontend JavaScript issues