# Attribution Debugging Guide

**Last Updated:** 2026-02-07  
**Status:** Prevention fixes implemented - see `ATTRIBUTION_PREVENTION_FIXES.md` for details

## Overview

This guide helps diagnose and fix attribution mismatches between leadSource and UTM parameters in HubSpot contacts.

## Common Mismatch Patterns

### 1. Missing utm_campaign for Google Ads Contacts

**Symptoms:**

- `leadSource`: "Google" or "Paid Search"
- `utm_source__c`: "adwords"
- `utm_medium__c`: "ppc"
- `utm_campaign__c`: (empty) ✗
- `gclid__c`: present
- Campaign name may be in `hs_analytics_first_url` OR form submission URL

**Root Causes:**

1. **Filtering Logic**: Campaign names with underscores (e.g., "DE_Search_B_Brand_Kombi") were being filtered as technical IDs
2. **URL Source Mismatch**: Campaign name may be in form submission URL but NOT in first URL (user navigated internally)
3. **Parameter Loss**: UTM parameters may be lost during internal navigation if not preserved in cookies

**Fix:**

1. Run audit: `php scripts/hubspot/hubspot-contact-audit.php <email>`
2. Check both `hs_analytics_first_url` AND form submission URL for `utm_campaign` parameter
3. Apply retro fix: `php scripts/hubspot/hubspot-retro-lead-fix.php <email>`
   - Script extracts campaign name from form submission URL (priority) > first URL > last URL
   - Filtering logic updated to allow underscores/hyphens in campaign names
4. Verify: Re-run audit to confirm utm_campaign is now populated

**Prevention:**

- Updated `isMeaningfulUTMValue()` to allow underscores, hyphens, and spaces in campaign names
- Retro fix script checks multiple URL sources with priority: form URL > first URL > last URL
- Filtering logic now only excludes long strings with ONLY alphanumeric (no separators)
- JavaScript UTM tracking preserves parameters in cookies during internal navigation

### 2. Direct Traffic Misattribution - Internal Domain Referrer

**Symptoms:**

- `leadSource`: "Direct Traffic"
- `hs_analytics_source`: "ORGANIC_SEARCH", "REFERRALS", or other non-direct source
- `hs_analytics_first_referrer` or `hs_analytics_last_referrer`: Contains ordio.com or subdomain
- User navigated internally before form submission

**Root Cause:**

When referrer is from ordio.com/subdomain, attribution logic skips referrer-based classification and defaults to "Direct Traffic", even when HubSpot analytics shows the original source (e.g., ORGANIC_SEARCH, REFERRALS).

**Fix:**

1. Run audit: `php scripts/hubspot/hubspot-contact-audit.php <email>`
2. Check `hs_analytics_source` - if it's not DIRECT_TRAFFIC, use that for attribution
3. Apply retro fix: `php scripts/hubspot/hubspot-retro-lead-fix.php <email>`
   - Script detects internal domain referrers
   - Uses `hs_analytics_source` to determine original attribution
   - Only overrides Direct Traffic if analytics source is not also DIRECT_TRAFFIC
4. Verify: Re-run audit to confirm leadSource matches analytics source

**Prevention:**

- Retro fix script enhanced to detect internal domain referrers
- Uses HubSpot analytics source when internal referrer detected
- JavaScript UTM tracking preserves parameters in cookies during internal navigation

### 3. LeadSource='Organic Search' but Analytics Shows DIRECT_TRAFFIC

**Symptoms:**

- `leadSource`: "Organic Search"
- `source__c`: "direct"
- `utm_medium__c`: "direct"
- `hs_analytics_source`: "DIRECT_TRAFFIC"
- `hs_analytics_last_referrer`: (empty)

**Root Cause:**
Attribution logic incorrectly classified direct traffic as organic search, likely due to:

- Tools pages being heuristically classified as organic search
- Missing referrer data causing incorrect inference
- Existing leadSource value being preserved when it shouldn't be

**Fix:**

1. Run audit: `php scripts/hubspot/hubspot-contact-audit.php <email>`
2. Analyze: `python3 scripts/hubspot/analyze-contact-attribution.py <audit-json>`
3. Apply retro fix: `php scripts/hubspot/hubspot-retro-lead-fix.php <email>`
4. Verify: Re-run audit and confirm attribution is consistent

**Prevention:**

- Ensure `ordio_resolve_attribution()` explicitly handles direct traffic indicators
- Check for `utm_source='direct'` AND `utm_medium='direct'` with no referrer
- Override existing leadSource when direct traffic indicators are present

### 3. LeadSource='Google' but utm_source='referral'

**Symptoms:**

- `leadSource`: "Google"
- `utm_source__c`: "referral" or external domain
- `utm_medium__c`: "referral"
- `gclid__c`: (empty)

**Root Cause:**
Contact was misclassified as Google Ads when it's actually referral traffic.

**Fix:**

1. Check if `gclid__c` exists - if not, it's not Google Ads
2. Verify referrer domain is not a search engine
3. Update leadSource to "referral" and set utm_source to referrer domain

### 4. LeadSource='Direct Traffic' but utm_source='google'

**Symptoms:**

- `leadSource`: "Direct Traffic"
- `utm_source__c`: "google"
- `utm_medium__c`: "organic"
- `hs_analytics_source`: "ORGANIC_SEARCH"

**Root Cause:**
Organic search traffic was incorrectly classified as direct traffic.

**Fix:**

1. Check `hs_analytics_source` - if "ORGANIC_SEARCH", update leadSource
2. Verify referrer is from search engine
3. Update leadSource to "Organic Search"

### 5. Google Ads Misattribution - Missing gclid Detection

**Symptoms:**

- `leadSource`: "Direct Traffic" or "Organic Search"
- Form submitted on Google Ads landing pages (`/gastro`, `/schichtbetriebe`)
- `gclid__c`: (empty) or present but not detected
- `hsa_src__c`: (empty) or 'g' but not detected
- `hs_analytics_first_url`: Contains `/gastro` or `/schichtbetriebe` but no UTM parameters
- User came from Google Ads but attribution lost

**Root Causes:**

1. **gclid Not Prioritized**: `determineLeadSourceFromContext()` didn't prioritize `gclid` detection - only checked if `utm_source` was also 'adwords'
2. **Detection Order Issue**: Google Ads detection happened AFTER `determineLeadSourceFromContext()`, allowing wrong leadSource values to persist
3. **Missing Override Logic**: Google Ads detection only set `leadSource = 'Google'` if empty, didn't override wrong values
4. **Parameter Loss**: UTM parameters lost during internal navigation before form submission
5. **Frontend Mismatch**: Frontend sent 'Paid Search' but backend expects 'Google'

**Fix:**

1. Run audit: `php v2/scripts/hubspot/google-ads-attribution-audit.php <contact_id>`
2. Check if `gclid` or `hsa_src='g'` present in URLs or contact properties
3. Verify form submission URL contains Google Ads parameters
4. Apply fixes (already implemented):
   - `determineLeadSourceFromContext()` now prioritizes `gclid` detection
   - Google Ads detection happens BEFORE lead source refinement
   - Wrong leadSource values are overridden when Google Ads indicators present
   - Frontend uses 'Google' instead of 'Paid Search'

**Prevention:**

- `determineLeadSourceFromContext()` checks `gclid` FIRST, returns 'Google' immediately if present
- Google Ads detection happens BEFORE `determineLeadSourceFromContext()` in form endpoints
- Override logic ensures wrong leadSource values are corrected when Google Ads indicators present
- Frontend and backend both use 'Google' for Google Ads attribution
- Enhanced logging warns when overriding incorrect leadSource values

**Related Files:**

- `v2/config/utm-validation.php` - `determineLeadSourceFromContext()` function
- `v2/api/lead-capture.php` - Google Ads detection logic
- `v2/api/collect-lead.php` - Google Ads detection logic
- `html/form-hs.php` - Demo booking modal Google Ads detection logic
- `v2/js/utm-tracking.js` - Frontend UTM tracking

**See Also:**

- `docs/systems/tracking/GOOGLE_ADS_ATTRIBUTION_AUDIT_2026-01.md` - Complete audit report
- `v2/scripts/hubspot/test-google-ads-attribution.php` - Test script
- `v2/scripts/hubspot/test-form-attribution-scenarios.php` - Form-specific test scenarios
- `v2/scripts/hubspot/test-all-form-attribution.php` - Comprehensive test suite

### 6. Inconsistent source**c and utm_source**c

**Symptoms:**

- `source__c`: "google"
- `utm_source__c`: "adwords"
- Both should represent the same source

**Root Cause:**
Fields were set at different times or by different logic paths.

**Fix:**

1. Determine correct source based on utm_medium and gclid
2. Update both fields to match
3. Ensure future updates set both fields consistently

## Troubleshooting Steps

### Step 1: Run Contact Audit

```bash
php scripts/hubspot/hubspot-contact-audit.php <email>
```

This creates a JSON file with all contact data, form submissions, and analytics.

### Step 2: Analyze Attribution

```bash
python3 scripts/hubspot/analyze-contact-attribution.py <audit-json-file>
```

This identifies mismatches and provides recommendations.

### Step 3: Test Attribution Logic

```bash
php scripts/hubspot/test-attribution-logic.php
```

This simulates `ordio_resolve_attribution()` with the contact's data to see what it should produce.

### Step 4: Apply Retro Fix

```bash
# Dry run first
php scripts/hubspot/hubspot-retro-lead-fix.php --dry-run <email>

# Apply fix
php scripts/hubspot/hubspot-retro-lead-fix.php <email>
```

### Step 5: Verify Fix

```bash
# Re-run audit
php scripts/hubspot/hubspot-contact-audit.php <email>

# Re-analyze
python3 scripts/hubspot/analyze-contact-attribution.py <new-audit-json>
```

## Attribution Logic Rules

### Google Ads (Paid Search)

**Indicators:**

- `gclid__c` is present
- `utm_source='adwords'` or `utm_source='google'`
- `utm_medium='ppc'` or `utm_medium='cpc'`
- `hs_analytics_source='PAID_SEARCH'`
- Campaign name may be in `hs_analytics_first_url` as `utm_campaign` parameter

**Correct Attribution:**

- `leadSource`: "Google" or "Paid Search"
- `source__c`: "adwords"
- `utm_medium__c`: "ppc" or "cpc"
- `utm_campaign__c`: Campaign name from URL (e.g., "DE_Search_B_Brand_Kombi")
- `utm_content__c`: Ad content identifier if available
- `utm_term__c`: Keyword if available

**Note:** Google Ads campaign names are often passed in URL parameters but may not be saved to fields. The retro fix script extracts campaign names from multiple URL sources with priority: form submission URL > first URL > last URL. This handles cases where users navigate internally before submitting forms.

**utm_content Note:** `utm_content` is optional for Google Ads and is only populated when tracking multiple ad variations. If `utm_content` is empty but other UTM parameters are present, this is normal and expected.

### Direct Traffic

**Indicators:**

- `utm_source='direct'` AND `utm_medium='direct'`
- `hs_analytics_source='DIRECT_TRAFFIC'`
- No referrer (`hs_analytics_last_referrer` is empty)

**Correct Attribution:**

- `leadSource`: "Direct Traffic"
- `source__c`: "direct"
- `utm_medium__c`: "direct"

### Organic Search

**Indicators:**

- `utm_medium='organic'`
- `hs_analytics_source='ORGANIC_SEARCH'`
- Referrer from search engine (google.com, bing.com, etc.)

**Correct Attribution:**

- `leadSource`: "Organic Search"
- `source__c`: "google" or "bing" (based on referrer)
- `utm_medium__c`: "organic"

### Paid Search (Google Ads)

**Indicators:**

- `gclid__c` is present
- `utm_source='adwords'` or `utm_source='google'`
- `utm_medium='ppc'` or `utm_medium='cpc'`

**Correct Attribution:**

- `leadSource`: "Google" or "Paid Search"
- `source__c`: "adwords"
- `utm_medium__c`: "ppc"

### Referral Traffic

**Indicators:**

- `utm_medium='referral'`
- `hs_analytics_source='REFERRAL'`
- Referrer from external domain (not search engine, not internal)

**Correct Attribution:**

- `leadSource`: "referral"
- `source__c`: referrer domain (sanitized)
- `utm_medium__c`: "referral"

## Testing Procedures

### Test New Contact Attribution

1. Create test contact with known UTM parameters
2. Submit form with UTM data
3. Verify attribution in HubSpot matches expected values
4. Check for any mismatches using audit script

### Test Attribution Logic Changes

1. Update `ordio_resolve_attribution()` in `v2/config/utm-validation.php`
2. Run test script: `php scripts/hubspot/test-attribution-logic.php`
3. Verify output matches expected attribution
4. Test with various input combinations (direct, organic, paid, referral)

### Regression Testing

1. Run audit on contacts with known correct attribution
2. Apply retro fix script in dry-run mode
3. Verify no incorrect changes are proposed
4. Test edge cases (Android intents, package IDs, etc.)

## Best Practices

1. **Always audit before fixing** - Understand the current state before making changes
2. **Use dry-run mode** - Test retro fix script before applying changes
3. **Verify after fixing** - Re-run audit to confirm attribution is correct
4. **Document edge cases** - Add new patterns to this guide as they're discovered
5. **Test attribution logic** - Verify changes to `ordio_resolve_attribution()` don't break existing cases

## Related Files

- `v2/config/utm-validation.php` - Attribution resolution logic
- `v2/js/utm-tracking.js` - Client-side UTM tracking
- `scripts/hubspot/hubspot-contact-audit.php` - Contact audit script
- `scripts/hubspot/hubspot-retro-lead-fix.php` - Retro fix script
- `scripts/hubspot/analyze-contact-attribution.py` - Attribution analysis
- `scripts/hubspot/test-attribution-logic.php` - Attribution logic testing

## Examples

### Example 1: Direct Traffic Fix

**Before:**

- leadSource: "Organic Search"
- source\_\_c: "direct"
- utm_medium\_\_c: "direct"
- hs_analytics_source: "DIRECT_TRAFFIC"

**After:**

- leadSource: "Direct Traffic"
- source\_\_c: "direct"
- utm_medium\_\_c: "direct"
- hs_analytics_source: "DIRECT_TRAFFIC"

**Command:**

```bash
php scripts/hubspot/hubspot-retro-lead-fix.php juergen.branz@gmail.com
```

### Example 2: Organic Search Fix

**Before:**

- leadSource: "Direct Traffic"
- source\_\_c: "direct"
- utm_medium\_\_c: "direct"
- hs_analytics_source: "ORGANIC_SEARCH"
- hs_analytics_last_referrer: "https://www.google.com/search?q=..."

**After:**

- leadSource: "Organic Search"
- source\_\_c: "google"
- utm_medium\_\_c: "organic"
- hs_analytics_source: "ORGANIC_SEARCH"

### 7. Partner Attribution Not Working

**Symptoms:**

- `partner__c` field in HubSpot is empty or shows "--" (no value)
- Partner parameter present in URL: `?partner=gastroberatung`
- Form submission includes partner parameter
- Partner value not appearing in HubSpot contact record

**Root Causes:**

1. **Value Mismatch**: Partner slug (e.g., `gastroberatung`) doesn't match HubSpot accepted value (e.g., `Gastro Beratung`)
2. **Missing Mapping**: Partner slug not mapped to HubSpot-accepted value
3. **Missing HubSpot Value**: Partner value doesn't exist in HubSpot `partner__c` enumeration options
4. **Mapping Not Applied**: API endpoint not using mapping function

**Fix:**

1. **Verify Partner Mapping:**

   ```bash
   php v2/scripts/hubspot/test-partner-attribution.php gastroberatung
   ```

   This will show if the partner slug is correctly mapped and if the value exists in HubSpot.

2. **Check Partner Configuration:**
   - Verify `hubspot_value` field exists in `v2/config/partner-config.php`
   - Ensure `hubspot_value` matches HubSpot accepted value exactly (case-sensitive, spaces matter)

3. **Check HubSpot Accepted Values:**

   ```bash
   php v2/scripts/hubspot/sync-partner-values.php --dry-run
   ```

   This will show which partner values are missing from HubSpot.

4. **Add Missing Values to HubSpot:**
   - Option A: Use sync script (requires API permissions):
     ```bash
     php v2/scripts/hubspot/sync-partner-values.php --add-missing
     ```
   - Option B: Add manually in HubSpot:
     - Go to Settings → Properties → Contacts → partner\_\_c
     - Add new option with exact value from partner config

5. **Verify API Endpoint Uses Mapping:**
   - Check that API endpoint includes: `require_once __DIR__ . '/../config/partner-hubspot-mapping.php';`
   - Verify partner slug is mapped: `$partner = mapPartnerToHubSpotValue($partnerSlug);`
   - Check logs for mapping warnings: `grep "Partner slug" v2/logs/*.log`

6. **Test Form Submission:**
   - Test URL: `https://www.ordio.com/v2/?partner=gastroberatung&title=Gastro%20Beratung%20empfiehlt%20Ordio&leadSource=Partner%20recommendation`
   - Submit form and verify HubSpot contact has `partner__c = "Gastro Beratung"`

**Prevention:**

- Always include `hubspot_value` field when adding new partners
- Run test script before deploying new partners
- Use sync script to verify all partners are in HubSpot
- Logging automatically warns when unmapped partners are encountered

**Example Fix:**

**Before:**

- URL: `?partner=gastroberatung`
- HubSpot `partner__c`: (empty) ✗
- Log: "Partner slug has no HubSpot mapping"

**After:**

- URL: `?partner=gastroberatung`
- Mapping: `gastroberatung` → `Gastro Beratung`
- HubSpot `partner__c`: "Gastro Beratung" ✓
- Log: "Partner slug mapped to HubSpot value: gastroberatung → Gastro Beratung"

## Support

For questions or issues with attribution debugging, refer to:

- This guide
- `docs/systems/shiftops/SHIFTOPS_TROUBLESHOOTING.md` for ShiftOps-specific issues
- `docs/systems/partner-pages/PARTNER_PAGES_GUIDE.md` for partner attribution details
- HubSpot API documentation for field definitions
