# UTM Parameter Cleaning Research Findings

**Last Updated:** 2026-01-29

## Executive Summary

Based on comprehensive research of industry best practices and analysis of the current implementation, this document provides findings to inform the decision on whether to keep, improve, or remove UTM parameter cleaning.

## Industry Best Practices Research

### When to Remove UTMs: Industry Consensus

**YES - Remove UTMs AFTER Analytics Capture**

Industry best practices consistently recommend removing UTM parameters from URLs **after** analytics tools have captured the data. Key reasons:

1. **Prevent Misattribution from Link Sharing**
   - When users copy/share URLs with UTMs, they carry tracking codes to different channels
   - Example: Link tagged `utm_source=instagram` shared via email incorrectly attributes to Instagram
   - This creates data quality issues in analytics reports

2. **User Experience**
   - Clean URLs look more professional
   - Reduces clutter in browser address bar
   - Improves perceived privacy (less visible tracking data)

3. **Privacy Considerations**
   - UTM parameters expose tracking information to anyone who sees the URL
   - Visible tracking data can be concerning to privacy-conscious users
   - GDPR/CCPA compliance considerations

### Timing Requirements: Critical

**CRITICAL:** UTMs must ONLY be removed AFTER analytics tools have captured the data.

- **Google Analytics/GA4:** Captures on page load via `gtag()` or GTM
- **HubSpot:** Captures via tracking script on page load
- **Google Tag Manager:** Processes tags on page load
- **Form Tracking:** Reads from cookies/localStorage (set before cleanup)

**Current Implementation:** 1.5 second delay before cleanup
- ✅ Allows time for analytics scripts to load and capture
- ✅ Cookies are set immediately (before cleanup)
- ✅ Forms read from cookies/localStorage (not URL)

### When NOT to Remove UTMs

- **Internal Links:** Never use UTMs on internal links (creates duplicate sessions in GA4)
- **Before Analytics Capture:** Never remove before analytics tools have fired
- **Google Ads Parameters:** Preserve `hsa_*`, `gclid`, `gad_source` (needed for conversion tracking)

## Current Implementation Analysis

### Cleanup Flow

1. **Page Load:**
   - UTM parameters extracted from URL → stored in instance variables
   - Cookies set immediately (`setUTMCookies()`)
   - localStorage fallback set
   - Original URL stored (`this.originalUrl`)
   - Cleanup scheduled (1.5s delay)

2. **After 1.5 Seconds:**
   - `setupUTMCleanup()` re-checks URL parameters
   - `cleanUTMParametersFromURL()` removes UTMs via `history.replaceState()`
   - Preserves `hsa_*` parameters (Google Ads conversion tracking)
   - Dispatches `utmCleanupComplete` event

3. **Form Submission:**
   - Forms read from `window.utmTracker.getUTMDataForAPI()`
   - Uses instance variables (persist after cleanup)
   - Falls back to cookies/localStorage
   - Backend reads from form fields, then `page_url`, then cookies

### Dependencies on Cleanup

**Code Dependencies:**
- `setupUTMCleanup()` - Called in `init()` method
- `cleanUTMParametersFromURL()` - Called by `setupUTMCleanup()`
- `originalUrl` storage - Used in `getUTMDataForAPI()` for backend fallback
- `refreshUTMData()` - Preserves instance variables when no URL params (cleanup happened)
- Form field population - Relies on instance variables/cookies after cleanup

**Documentation Dependencies:**
- `docs/development/UTM_CLEANUP_DISCREPANCY_FIX.md`
- `docs/development/UTM_FORM_TRACKING_VERIFICATION.md`
- `docs/development/UTM_TRACKING_DEBUGGING.md`
- Multiple references in cursor rules

### Recent Bug Fixes

**Issues Fixed:**
1. Cleanup timing discrepancy across pages (`/v3` vs `/gastro`/`/schichtbetriebe`)
2. Instance variables being overwritten after cleanup
3. Form fields not populated correctly after cleanup
4. `page_url` field not preserving original URL

**Root Causes:**
- Cleanup condition check at setup time vs execution time
- `history.replaceState()` failing silently
- `refreshUTMData()` overwriting instance variables
- Missing fallback logic in form population

**Complexity Added:**
- ~150 lines of cleanup-related code
- Multiple fallback mechanisms
- Complex timing logic
- Extensive debugging and testing required

## Analytics Tool Verification

### Google Tag Manager (GTM)

**Loading:** Via `<script>` tag in `v2/base/footer.php`
**Timing:** Loads asynchronously, processes tags on page load
**UTM Capture:** GTM reads from `window.location.search` on page load
**Status:** ✅ Should capture before 1.5s cleanup delay

### Google Analytics 4 (GA4)

**Loading:** Via GTM or direct `gtag()` calls
**Timing:** Fires on page load, before cleanup
**UTM Capture:** Reads from URL parameters on page load
**Status:** ✅ Should capture before 1.5s cleanup delay

### HubSpot Tracking

**Loading:** Via `<script>` tag in `v2/base/footer.php` (async defer)
**Timing:** Loads asynchronously, may fire after cleanup
**UTM Capture:** HubSpot script reads from URL on load
**Status:** ⚠️ May not capture if script loads slowly (async defer)

### Form Submission Tracking

**Data Source:** Cookies/localStorage (set before cleanup)
**Timing:** Forms read from `window.utmTracker.getUTMDataForAPI()`
**Status:** ✅ Works correctly (uses instance variables/cookies)

## Pros and Cons Analysis

### Keeping Cleanup (Current Implementation)

**Pros:**
- ✅ Prevents misattribution from link sharing
- ✅ Cleaner URLs (better UX)
- ✅ Privacy benefits (less visible tracking)
- ✅ Follows industry best practices
- ✅ Prevents internal link UTM carryover

**Cons:**
- ❌ Complex implementation (~150 lines)
- ❌ Recent bugs required fixes
- ❌ Timing-dependent (1.5s delay)
- ❌ Browser compatibility issues (`history.replaceState()` failures)
- ❌ Maintenance burden
- ❌ Debugging complexity
- ⚠️ Potential HubSpot tracking timing issue (async defer)

### Removing Cleanup

**Pros:**
- ✅ Simpler codebase (~150 lines removed)
- ✅ No timing dependencies
- ✅ No browser compatibility issues
- ✅ Easier debugging
- ✅ Forms can read directly from URL
- ✅ Less maintenance burden
- ✅ Guaranteed analytics capture (UTMs always in URL)

**Cons:**
- ❌ Misattribution risk from link sharing
- ❌ Cluttered URLs (worse UX)
- ❌ Privacy concerns (visible tracking data)
- ❌ Doesn't follow industry best practices
- ❌ Internal links may carry UTMs

## Risk Assessment

### Risks if Cleanup Removed

1. **Misattribution Risk: HIGH**
   - Users sharing links with UTMs will cause incorrect attribution
   - Example: Instagram link shared via email → attributed to Instagram
   - Impact: Data quality issues in analytics reports

2. **UX Impact: MEDIUM**
   - Cluttered URLs in address bar
   - Less professional appearance
   - Privacy-conscious users may be concerned

3. **Internal Link Risk: MEDIUM**
   - Internal navigation may carry UTMs
   - Creates duplicate sessions in GA4
   - Can be mitigated with `preventInternalUTMCarryover()` (keep this)

### Risks if Cleanup Kept

1. **Complexity Risk: HIGH**
   - Recent bugs prove complexity
   - Maintenance burden
   - Debugging difficulty

2. **Timing Risk: MEDIUM**
   - HubSpot async defer may miss UTMs
   - 1.5s delay may not be enough in slow networks
   - Browser compatibility issues

3. **Data Loss Risk: LOW**
   - Forms use cookies/localStorage (mitigated)
   - Backend has fallback logic (mitigated)

## Alternative Solutions

### Option 1: Server-Side Cleanup

**Approach:** Remove UTMs via PHP/Apache rewrite after analytics capture
**Pros:** More reliable timing, no browser compatibility issues
**Cons:** Requires server-side changes, may affect analytics capture

### Option 2: Analytics-First Approach

**Approach:** Let analytics tools handle UTM removal (some tools do this)
**Pros:** No custom code needed
**Cons:** Not all tools support this, less control

### Option 3: Hybrid Approach

**Approach:** Clean only on share (detect share events), not on view
**Pros:** Prevents misattribution without complexity
**Cons:** Requires share event detection, may miss some shares

### Option 4: Improved Client-Side

**Approach:** Keep cleanup but improve implementation
**Pros:** Maintains benefits, reduces complexity
**Cons:** Still has timing dependencies

## Recommendation

**PRELIMINARY RECOMMENDATION:** Keep cleanup but improve implementation

**Rationale:**
1. Industry best practices strongly recommend cleanup
2. Misattribution risk is significant
3. Current implementation works but has complexity issues
4. Improvements can reduce complexity while maintaining benefits

**Next Steps:**
1. Verify HubSpot tracking timing (may need adjustment)
2. Test analytics capture timing thoroughly
3. Consider server-side cleanup as alternative
4. Create A/B test to measure impact

## Testing Requirements

Before making final decision, need to:

1. **Verify Analytics Capture:**
   - Test GTM/GA4 capture timing
   - Test HubSpot capture timing
   - Measure actual capture times vs cleanup delay

2. **Test Link Sharing Scenarios:**
   - Simulate link sharing
   - Measure misattribution incidents
   - Test with/without cleanup

3. **Performance Testing:**
   - Measure cleanup impact on page load
   - Test browser compatibility
   - Test edge cases

4. **A/B Testing:**
   - Test cleanup enabled vs disabled
   - Monitor analytics data quality
   - Monitor form attribution
