# ShiftOps Team Estimation - Enhanced Model Design


**Last Updated:** 2025-11-20

## Design Principles

1. **Consistency:** Same formula across all implementations
2. **Accuracy:** Multi-factor approach with validated weights
3. **Robustness:** Handle missing data gracefully
4. **Transparency:** Clear confidence scoring and factor breakdown
5. **Validation:** Sanity checks and reasonable bounds

## Enhanced Formula

### Factor 1: Customer Volume Proxy (35% weight)

**Calculation:**

```
estimatedMonthsOperating = estimateBusinessAge(reviewCount)
reviewsPerEmployeePerMonth = getIndustryBenchmark(primaryType)
volumeFactor = reviewCount / (reviewsPerEmployeePerMonth * estimatedMonthsOperating)
// Cap volume factor: 4.0x for restaurants with 2000+ reviews, 6.0x otherwise
maxVolumeCap = (reviewCount > 2000 && primaryType === 'restaurant') ? 4.0 : 6.0
volumeFactor = clamp(volumeFactor, 0.3, maxVolumeCap)
```

**Business Age Estimation:**

- < 25 reviews: 12 months (1 year)
- 25-100 reviews: 24 months (2 years)
- 101-500 reviews: 36 months (3 years)
- 501-1000 reviews: 48 months (4 years)
- > 1000 reviews: 60 months (5 years)

**Reviews Per Employee Per Month:**

- restaurant: 10
- cafe: 6
- bar: 8
- store: 4
- hospital: 12
- pharmacy: 7
- general: 6

### Factor 2: Operating Hours (25% weight)

**Calculation:**

```
weeklyHours = analyzeBusinessHours(businessData)
hoursFactor = weeklyHours / 40  // Standard FTE
hoursFactor = clamp(hoursFactor, 0.4, 3.0)
```

**Default:** 40 hours/week if missing

### Factor 3: Service Complexity (20% weight)

**Calculation:**

```
serviceTypes = count(available service options)
complexityFactor = 1.0 + (serviceTypes * 0.12)  // +12% per service type
complexityFactor = clamp(complexityFactor, 1.0, 2.5)
```

**Service Types Counted:**

- dine_in, takeout, delivery, reservable
- serves_breakfast, serves_lunch, serves_dinner
- wheelchair_accessible (if relevant)

### Factor 4: Quality/Scale Indicator (15% weight)

**Calculation:**

```
rating = businessData['rating'] ?? 3.5
priceLevel = businessData['price_level'] ?? 2

qualityFactor = 1.0
if (rating >= 4.5 && priceLevel >= 3): qualityFactor = 1.35  // Premium
elseif (rating >= 4.5 && priceLevel >= 2): qualityFactor = 1.20  // High quality
elseif (rating >= 4.0 && priceLevel >= 2): qualityFactor = 1.10  // Above average
elseif (rating < 3.5): qualityFactor = 0.90  // Budget operation
```

### Factor 5: Review Velocity (5% weight) - Optional

**Calculation:**

```
// If we can estimate review velocity (future enhancement)
reviewVelocity = estimateReviewVelocity(businessData)
velocityFactor = 1.0 + (reviewVelocity - averageVelocity) * 0.1
velocityFactor = clamp(velocityFactor, 0.9, 1.2)
```

**Current:** Not implemented, factor = 1.0

### Base Staffing by Industry

```
baseStaffing = [
    'restaurant' => 5,   // Chef + 2 cooks + 2 servers minimum
    'cafe' => 3,         // 2 baristas + 1 cashier
    'bar' => 3,          // 2 bartenders + 1 server
    'store' => 2,        // Cashier + stock person
    'hospital' => 8,     // Medical staff requirements
    'pharmacy' => 3,     // Pharmacist + 2 techs
    'general' => 3
]
```

### Final Calculation

```
estimatedTeam = baseTeam * (
    (volumeFactor * 0.35) +
    (hoursFactor * 0.25) +
    (complexityFactor * 0.20) +
    (qualityFactor * 0.15) +
    (velocityFactor * 0.05)
)

// Apply validation bounds
bounds = getValidationBounds(primaryType, reviewCount)
estimatedTeam = max(bounds.min, min(estimatedTeam, bounds.max))

// Safety check: Ensure restaurants don't exceed 25
if (primaryType === 'restaurant' && estimatedTeam > 25):
    estimatedTeam = 25

estimatedTeam = round(estimatedTeam)
```

### Validation Bounds

**Minimum Team Size:**

- Based on base staffing (industry-specific)
- Absolute minimum: 1

**Maximum Team Size:**

- **Restaurant:** `min(25, max(5, ceil(reviewCount / 15)))` - **Capped at 25**
- **Cafe:** `min(15, max(3, ceil(reviewCount / 20)))` - **Capped at 15**
- **Bar:** `min(25, max(3, ceil(reviewCount / 18)))` - **Capped at 25**
- **Store:** `min(30, max(2, ceil(reviewCount / 25)))` - **Capped at 30**
- **Hospital:** `min(60, max(8, ceil(reviewCount / 10)))` - **Capped at 60**
- **Pharmacy:** `min(20, max(3, ceil(reviewCount / 15)))` - **Capped at 20**
- **General:** `min(30, max(3, ceil(reviewCount / 20)))` - **Capped at 30**

**Rationale:** Realistic caps based on industry research. Single-location restaurants typically max at 25 employees. Caps prevent unrealistic estimates while allowing for scaling based on review count.

### Safety Checks

**Additional validation after location multipliers:**

```
// After applying location multipliers and validation bounds
if (primaryType === 'restaurant' && estimatedTeam > 25):
    estimatedTeam = 25
```

**Rationale:** Location multipliers can push values beyond realistic limits. This safety check ensures restaurants never exceed 25 employees, even if location multipliers suggest otherwise.

## Enhanced Confidence Calculation

### Scoring System

```
score = 0

// Review count (most important)
if (reviewCount > 200): score += 4
elseif (reviewCount > 100): score += 3
elseif (reviewCount > 50): score += 2
elseif (reviewCount > 25): score += 1

// Data completeness
if (has opening_hours): score += 2
if (has service_options): score += 1
if (has price_level): score += 1
if (has rating): score += 1

// Data quality indicators
if (reviewCount > 0 && has rating): score += 1
if (has multiple service options): score += 1
```

### Confidence Levels

- **High:** 8+ points
- **Medium:** 5-7 points
- **Low:** <5 points

## Fallback Strategies

### Level 1: Full Model (Preferred)

- All factors available
- Use complete formula

### Level 2: Missing Hours

- Use default 40 hours/week
- Reduce confidence by 1 point

### Level 3: Missing Price Level

- Use default price level 2
- Reduce confidence by 1 point

### Level 4: Review-Only Model

- Use only volume factor
- Base on reviews only
- Lower confidence

### Level 5: Industry Average

- Use base staffing only
- Very low confidence

## Data Quality Assessment

### Quality Indicators

```php
quality = []

if (reviewCount > 100):
    quality[] = 'high_review_volume'
if (reviewCount > 500):
    quality[] = 'very_high_review_volume'
if (has opening_hours):
    quality[] = 'complete_hours'
if (has service_options):
    quality[] = 'service_details'
if (has price_level):
    quality[] = 'pricing_info'
if (has rating && rating > 0):
    quality[] = 'rating_available'
if (serviceTypes >= 3):
    quality[] = 'multiple_services'
```

## Return Structure

```php
[
    'team_size' => int,                    // Primary estimate
    'confidence_level' => 'low'|'medium'|'high',
    'confidence_score' => int,              // Raw score (0-12)
    'factors_used' => [
        'volume' => float,
        'hours' => float,
        'complexity' => float,
        'quality' => float,
        'velocity' => float  // Always 1.0 for now
    ],
    'base_staffing' => int,
    'data_quality' => array,                // Quality indicators
    'validation' => [
        'min_bound' => int,
        'max_bound' => int,
        'applied_min' => bool,
        'applied_max' => bool
    ],
    'fallback_level' => int                 // 1-5, which fallback was used
]
```

## Improvements Over Current Implementation

1. **Consistent Formula:** Same calculation everywhere
2. **Better Weights:** Based on research and analysis
3. **Reasonable Caps:** Not too conservative, not too aggressive
4. **Enhanced Confidence:** More comprehensive scoring
5. **Better Fallbacks:** Graceful degradation
6. **Validation:** Sanity checks and bounds
7. **Transparency:** Detailed factor breakdown

## Migration Strategy

1. **Phase 1:** Implement new model alongside old (feature flag)
2. **Phase 2:** Test and validate on real data
3. **Phase 3:** Gradually migrate implementations
4. **Phase 4:** Remove old implementations
5. **Phase 5:** Monitor and refine

## Testing Requirements

1. **Unit Tests:** Test each factor calculation
2. **Integration Tests:** Test full formula
3. **Edge Cases:** Test boundary conditions
4. **Comparison Tests:** Compare with old implementations
5. **Validation Tests:** Test bounds and sanity checks
