# ShiftOps Team Estimation - Testing Guide


**Last Updated:** 2025-11-20

## Overview

This guide explains how to use the testing infrastructure for team size estimation validation and comparison.

## Test Scripts

### 1. Replica Scripts

**Purpose:** Exact replicas of current implementations for testing

**Files:**

- `scripts/test-team-estimation/replica-analyzer.php`
- `scripts/test-team-estimation/replica-cost-calculator.php`
- `scripts/test-team-estimation/replica-javascript.js`

**Usage:**

```bash
php scripts/test-team-estimation/replica-analyzer.php
php scripts/test-team-estimation/replica-cost-calculator.php
node scripts/test-team-estimation/replica-javascript.js
```

**Output:** Test results showing estimates for sample cases

### 2. Test Dataset Generator

**Purpose:** Generate comprehensive test cases

**File:** `scripts/test-team-estimation/generate-test-dataset.php`

**Usage:**

```bash
php scripts/test-team-estimation/generate-test-dataset.php
```

**Output:** `scripts/test-team-estimation/test-dataset.json`

**Contents:**

- 287+ standard test cases
- 9 focused test cases
- Edge cases

**Test Case Structure:**

```json
{
  "id": 1,
  "name": "restaurant_50reviews_4.0rating_2price_2services",
  "data": {
    "user_ratings_total": 50,
    "types": ["restaurant"],
    "rating": 4.0,
    "price_level": 2,
    "service_options": {
      "dine_in": true,
      "takeout": true
    },
    "opening_hours": { ... }
  },
  "category": "standard"
}
```

### 3. Comparison Tool

**Purpose:** Compare all implementations on same dataset

**File:** `scripts/test-team-estimation/compare-implementations.php`

**Usage:**

```bash
php scripts/test-team-estimation/compare-implementations.php
```

**Prerequisites:** Run `generate-test-dataset.php` first

**Output:**

- Console summary with statistics
- `scripts/test-team-estimation/comparison-results.json`

**Statistics Generated:**

- Mean, median, min, max for each implementation
- Differences between implementations
- Outliers (cases with large differences)
- Analysis by business type
- Analysis by review count range

### 4. Enhanced Model Validator

**Purpose:** Validate enhanced model against current implementations

**File:** `scripts/test-team-estimation/validate-enhanced-model.php`

**Usage:**

```bash
php scripts/test-team-estimation/validate-enhanced-model.php
```

**Prerequisites:** Run `generate-test-dataset.php` first

**Output:**

- Console summary comparing enhanced vs current
- `scripts/test-team-estimation/enhanced-validation-results.json`

### 5. Enhanced Estimator Prototype

**Purpose:** Test enhanced estimation logic

**File:** `scripts/test-team-estimation/enhanced-estimator.php`

**Usage:**

```bash
php scripts/test-team-estimation/enhanced-estimator.php
```

**Output:** Test results for enhanced model

## Adding New Test Cases

### To Test Dataset Generator

Edit `scripts/test-team-estimation/generate-test-dataset.php`:

1. **Add to focused cases:**

```php
[
    'name' => 'your_test_case',
    'data' => [
        'user_ratings_total' => 100,
        'types' => ['restaurant'],
        'rating' => 4.5,
        'price_level' => 3,
        'service_options' => ['dine_in' => true],
        'opening_hours' => $this->generateHoursPattern(60)
    ]
]
```

2. **Regenerate dataset:**

```bash
php scripts/test-team-estimation/generate-test-dataset.php
```

### Manual Test Case

Create JSON file:

```json
{
  "name": "manual_test",
  "data": {
    "user_ratings_total": 200,
    "types": ["restaurant"],
    "rating": 4.5,
    "price_level": 3,
    "service_options": {
      "dine_in": true,
      "takeout": true,
      "delivery": true
    },
    "opening_hours": {
      "periods": [
        {
          "open": { "day": 0, "time": "1100" },
          "close": { "day": 0, "time": "2200" }
        }
      ]
    }
  }
}
```

## Interpreting Results

### Accuracy Metrics

**Mean Absolute Error (MAE):**

- Average difference between estimates
- Lower is better

**Root Mean Squared Error (RMSE):**

- Penalizes larger errors more
- Lower is better

**Mean Absolute Percentage Error (MAPE):**

- Percentage error
- Lower is better

### Confidence Levels

- **High (8+ points):** Reliable estimate, good data quality
- **Medium (5-7 points):** Reasonable estimate, moderate data quality
- **Low (<5 points):** Less reliable, limited data

### Outliers

Cases with `max_difference >= 10` are flagged as outliers. Review these to understand edge cases.

## Test Scenarios

### Standard Scenarios

1. **Small Business:** <50 reviews, minimal data
2. **Medium Business:** 50-200 reviews, complete data
3. **Large Business:** 500+ reviews, comprehensive data
4. **Zero Reviews:** New business, estimate from base staffing
5. **High Reviews:** 1000+ reviews, test maximum caps

### Edge Cases

1. **Missing Data:** No opening hours, no price level
2. **Extreme Values:** Very high/low ratings, many services
3. **Unknown Types:** Business type not in priority list
4. **Multiple Types:** Business has multiple type classifications

### Business Type Coverage

Test all types:

- restaurant
- cafe
- bar
- store
- hospital
- pharmacy
- general

## Validation Checklist

Before deploying changes:

- [ ] Run test dataset generator
- [ ] Run comparison tool
- [ ] Review statistics summary
- [ ] Check for outliers
- [ ] Verify confidence distribution
- [ ] Test edge cases manually
- [ ] Compare enhanced vs current
- [ ] Verify backward compatibility
- [ ] Check lint errors
- [ ] Test in browser (localhost:8003)

## Known Limitations

1. **No Real Data:** Cannot validate against actual team sizes
2. **Synthetic Test Cases:** Based on assumptions
3. **Limited Coverage:** May not cover all real-world scenarios
4. **Static Dataset:** Doesn't evolve with production data

## Future Enhancements

1. **Real Data Collection:** Gather actual team sizes for validation
2. **Automated Testing:** CI/CD integration
3. **Performance Monitoring:** Track accuracy over time
4. **A/B Testing:** Compare old vs new in production
5. **Machine Learning:** Train model on real data

## Troubleshooting

### Memory Issues

If dataset generation fails:

- Reduce test case combinations
- Use focused cases only
- Increase PHP memory limit

### Inconsistent Results

If implementations differ:

- Check factor weights match
- Verify validation bounds
- Compare business age estimation
- Check confidence calculation

### Missing Data

If test cases fail:

- Verify required fields present
- Check default values
- Review fallback logic

## Support

For questions or issues:

1. Review documentation in `docs/systems/shiftops/`
2. Check test script comments
3. Review comparison results
4. Examine enhanced model design doc
