# HubSpot lead source attribution policy (Ordio)

**Last Updated:** 2026-03-29

This document is the **business and operations source of truth** for how Ordio uses **`leadsource`** (dropdown), **UTM custom properties**, **Google Click ID (`gclid__c`)**, and **HubSpot analytics** (`hs_analytics_source`, drill-downs). It complements the technical audit in [HUBSPOT_LEADSOURCE_UTM_AUDIT.md](./HUBSPOT_LEADSOURCE_UTM_AUDIT.md).

## Reporting: what is “source of truth”?

| Question | Recommended answer |
|----------|-------------------|
| **Marketing channel mix (first touch)** | HubSpot **Original source** / analytics properties — session-based, can differ from form fields. |
| **Campaign / last-touch context on the form** | **`source__c`**, **`utm_*`**, **`gclid__c`** — what the browser sent when the form was submitted. |
| **Sales / ops taxonomy** | **`leadsource`** dropdown — business buckets (incl. SDR-specific values such as **Freelancesdr** when intentional). |

They are **allowed to diverge** by design when analytics reflects an earlier session and the form reflects a later visit. See footnotes [^hubspot-original-source].

## Group decisions (remediation 2026-03-29)

| Group | Decision | CRM bulk change |
|-------|----------|------------------|
| **1** | Accept as-is | No — document analytics vs UTM divergence |
| **2** | CRM fix + trace integration | Yes — align `leadsource` to **meta** where UTM shows Meta paid; trace [`lead-capture.php`](../../../v2/api/lead-capture.php) Step 2 / [`collect-lead.php`](../../../v2/api/collect-lead.php) / [`html/form-hs.php`](../../../html/form-hs.php) |
| **3** | Optional CRM fix | Only if policy says **Organic Search** for those rows (taxonomy: “Google” may mean paid in your portal) |
| **4** | CRM fix if UTM trusted | Align `leadsource` to **Organic Search** when `utm_medium__c` = organic |
| **5** | Case-by-case | See [GROUP5_CASE_DECISIONS.md](./GROUP5_CASE_DECISIONS.md); do **not** auto-overwrite **Freelancesdr** if that is the agreed SDR attribution [^gclid-diagnostics] |
| **6** | Prefer reporting; narrow CRM fix if proven | Default **no** bulk UTM rewrite; use **evidence-based** multi-property PATCH only when form/submission proves ads [^multi-touch] |

## `gclid__c` vs `leadsource` (Group 5 rule)

- **Default (marketing attribution):** If policy says the contact should reflect **Google Ads**, set **`leadsource`** to your portal’s paid-Google option (often **`Google`** in Ordio code — confirm against live enum).
- **Exception (SDR / process):** If **Freelancesdr** (or similar) **must** win for routing or reporting, **keep** that `leadsource` and treat **`gclid__c`** as **diagnostic** (proof of an ad click), not as a forced overwrite in workflows.
- **Merge artifacts:** Manual HubSpot **merges** can combine **`gclid__c`** from one record with **`leadsource`** from the primary; classify as **M_mergeArtifact** in the Group 5 workbook — fix in CRM only after choosing the surviving story.

## HubSpot Operations Hub / workflows (manual checklist)

Complete in the HubSpot UI (not automatable from this repo):

1. Export or list **workflows** that **set or clear** `leadsource`, `source__c`, or `utm_*` on create or update.
2. Prefer **fill-only** rules: e.g. set `leadsource` from UTM **only when empty**, to avoid fighting Ordio API or intentional SDR values.
3. Avoid a global rule **“if gclid present then always set Google”** unless policy above removes the **Freelancesdr** exception.

Record workflow names and dates reviewed in a spreadsheet or internal wiki; optionally paste a short summary (no secrets) into the **Decision log** in [HUBSPOT_LEADSOURCE_UTM_AUDIT_PATTERNS_SUMMARY.md](./HUBSPOT_LEADSOURCE_UTM_AUDIT_PATTERNS_SUMMARY.md).

## Portal enum snapshot (how to refresh)

Run (requires `HUBSPOT_API_TOKEN`):

```bash
php v2/scripts/hubspot/fetch-leadsource-property-options.php
```

Paste relevant options into an internal note or ticket when changing dropdown labels. Audit logic maps canonical names via [`v2/helpers/hubspot-leadsource-audit-rules.php`](../../../v2/helpers/hubspot-leadsource-audit-rules.php).

## Retroactive CRM patches

Use **`patch-leadsource-from-audit.php`** with a signed-off CSV (see [patch-leadsource-template.csv](./patch-leadsource-template.csv)). Default is **`--dry-run`**. Writes and logs stay under **`var/hubspot-audits/`** (gitignored).

**Narrow attribution (Group 6 / evidence-based):** When you intentionally change **`leadsource` plus** `source__c` / `utm_*` (only with proof, e.g. form submission URL still shows ads), use [`v2/scripts/hubspot/patch-contact-attribution-from-csv.php`](../../v2/scripts/hubspot/patch-contact-attribution-from-csv.php) with [patch-attribution-narrow-template.csv](./patch-attribution-narrow-template.csv). Same dry-run default; do **not** bulk “fix” UTMs from analytics alone.

## Paid search UTM backfill (hybrid evidence ladder)

HubSpot **Ads activity** in the UI is **not** exposed as a reliable bulk CRM API. Automation uses **proxy evidence**: `hs_analytics_source` / drill-downs, `gclid__c`, and **engagements** (form submission URLs, page views), consistent with [`v2/scripts/hubspot/google-ads-attribution-audit.php`](../../../v2/scripts/hubspot/google-ads-attribution-audit.php).

### Cohort (export)

Contacts match **all** of:

1. `hubspot_audit_bucket_hs_analytics_source(hs_analytics_source)` → **`paid_search`** (see [`hubspot-leadsource-audit-rules.php`](../../../v2/helpers/hubspot-leadsource-audit-rules.php)).
2. **UTM gap:** `utm_medium__c` is empty, **`organic`**, or **`direct`** **or** `utm_campaign__c` is empty (either condition qualifies).
3. **Exclusions:** Same hard exclusions as the main audit (trade fair, Cello, affiliate, etc.) and **no** `leadsource` containing **Freelancesdr** (case-insensitive).

**Narrow cohort** (pragmatic mapping allowed only here): `email` ends with **`@temp.ordio.com`** **or** `sign_up_type__c` contains **`lead capture`** (case-insensitive).

### Evidence ladder

1. **Strict (preferred):** PATCH `source__c` / `utm_*` only when a **form submission** or **page view** engagement URL contains paid signals (`gclid`, `hsa_*`, `utm_medium` in paid set, `utm_source` adwords/google with paid medium). Implementations: [`hubspot_paid_search_url_has_paid_signal()`](../../../v2/helpers/hubspot-paid-search-utm-gap.php), batch [`paid-search-utm-gap-engagement-batch.php`](../../../v2/scripts/hubspot/paid-search-utm-gap-engagement-batch.php), patch builder [`build-patch-csv-paid-search-strict.php`](../../../v2/scripts/hubspot/build-patch-csv-paid-search-strict.php).
2. **Pragmatic (narrow only):** If **no** URL proof **and** **narrow_cohort** = yes, map `hs_analytics_source_data_1` / `_2` text through the reviewed JSON table [`v2/data/hubspot/paid-search-analytics-campaign-map.json`](../../../v2/data/hubspot/paid-search-analytics-campaign-map.json). Builder: [`build-patch-csv-paid-search-pragmatic.php`](../../../v2/scripts/hubspot/build-patch-csv-paid-search-pragmatic.php). **Human-review** the CSV before `--apply` (drill-down text may not equal live Ads `utm_campaign`).
3. **Operator case review (residual cohort):** When **strict** and **narrow pragmatic** still leave rows (e.g. **no** `narrow_cohort` but CRM `source__c` / `utm_*` are **blank** and `hs_analytics_source_data_1` clearly matches the map), review **per contact**: re-run engagements batch, extend [`paid-search-analytics-campaign-map.json`](../../../v2/data/hubspot/paid-search-analytics-campaign-map.json) if new drill-down slugs appear, then build a **signed CSV** with `reason_code` + `notes` per row and run [`patch-contact-attribution-from-csv.php`](../../../v2/scripts/hubspot/patch-contact-attribution-from-csv.php). **Do not** use this tier when the contact already has **contradictory** intentional UTMs or **Freelancesdr** / partner exclusions. For **adwords + ppc** but **empty** `utm_campaign__c` only, PATCH **campaign** (and optionally infer from `hs_analytics_last_url` / term) only after checking HubSpot properties—document the inference in `notes`.

### CLI recipe

```bash
php v2/scripts/hubspot/audit-leadsource-utm-discrepancies.php --days=90 \
  --paid-utm-gap-output=var/hubspot-audits/paid-search-utm-gap-$(date +%F).csv
php v2/scripts/hubspot/paid-search-utm-gap-engagement-batch.php \
  --input=var/hubspot-audits/paid-search-utm-gap-$(date +%F).csv \
  --output=var/hubspot-audits/paid-search-utm-gap-engagements-$(date +%F).csv
php v2/scripts/hubspot/build-patch-csv-paid-search-strict.php \
  --engagements=var/hubspot-audits/paid-search-utm-gap-engagements-$(date +%F).csv \
  --output=var/hubspot-audits/patch-paid-search-strict-$(date +%F).csv
php v2/scripts/hubspot/build-patch-csv-paid-search-pragmatic.php \
  --engagements=var/hubspot-audits/paid-search-utm-gap-engagements-$(date +%F).csv \
  --output=var/hubspot-audits/patch-paid-search-pragmatic-$(date +%F).csv
# Review CSVs, then dry-run (default):
php v2/scripts/hubspot/patch-contact-attribution-from-csv.php --input=var/hubspot-audits/patch-paid-search-strict-$(date +%F).csv
php v2/scripts/hubspot/patch-contact-attribution-from-csv.php --input=var/hubspot-audits/patch-paid-search-pragmatic-$(date +%F).csv
# After sign-off:
php v2/scripts/hubspot/patch-contact-attribution-from-csv.php --input=… --apply
```

Logs: JSONL under **`var/hubspot-audits/`** (gitignored).

### Forward fix (integration)

Two-step Lead Capture Step 2 can overwrite **paid** CRM UTMs with **weaker** browser UTMs. **`updateHubSpotContact`** prefetches CRM `source__c` / `utm_*` / `gclid__c` and **merges** with Step 2 when CRM attribution is stronger ([`hubspot_lead_capture_merge_step2_utms_with_crm()`](../../../v2/helpers/hubspot-paid-search-utm-gap.php)). See [HUBSPOT_LEADSOURCE_UTM_AUDIT.md](./HUBSPOT_LEADSOURCE_UTM_AUDIT.md) (Lead Capture trace).

### Decision log (paid UTM gap apply)

| Date (UTC) | Operator | Ticket / note | Strict rows applied | Pragmatic rows applied |
|------------|----------|---------------|---------------------|-------------------------|
| 2026-03-29 | Ops (repo scripts) | Paid UTM gap pipeline: `paid-search-utm-gap-20260329.csv` → engagements → combined patch; JSONL `patch-attribution-log-2026-03-29T100917Z.jsonl` | 1 | 2 |
| 2026-03-29 | Ops (case review) | **14** residual non-narrow contacts: engagements had **no** URL proof; drill-down–driven PATCH via `patch-paid-gap-case-review-20260329.csv` (map extended for `de_search_b_brand_kombi`); JSONL `patch-attribution-log-2026-03-29T101407Z.jsonl`. Re-audit: **0** `paid_search_utm_gap` rows (90d). | — | — |

## Next steps checklist (operations)

1. **Refresh audit** — `php v2/scripts/hubspot/audit-leadsource-utm-discrepancies.php --days=90 --output=var/hubspot-audits/leadsource-utm-audit-$(date +%F).csv` (optional: `--paid-utm-gap-output=…` for the paid-search UTM gap cohort)
2. **Slice by reason** — e.g. `filter-audit-csv-by-reason.php --reason=gclid_present` or `meta_paid_medium`, `utm_medium_organic`, `analytics_paid_search_vs_leadsource_organic`
3. **Group 5** — Fill [GROUP5_CASE_DECISIONS.md](./GROUP5_CASE_DECISIONS.md) sub-buckets after HubSpot UI review; run **`contact-attribution-dossier.php`** on all five IDs
4. **Build PATCH CSV** — Draft from audit: **`build-patch-csv-from-audit.php`** (`--reason=…`); edit out exceptions; then **`patch-leadsource-from-audit.php`**. Optional UTMs: **`patch-contact-attribution-from-csv.php`**
5. **Dry-run → apply** — Review JSONL logs under `var/hubspot-audits/`
6. **Re-audit** — Confirm tier-A counts dropped
7. **HubSpot workflows** — Complete the manual checklist above; avoid global gclid→Google if **Freelancesdr** exception applies

## References (footnotes)

[^hubspot-original-source]: HubSpot — [Original source / source data properties](https://knowledge.hubspot.com/reports/what-do-the-properties-original-source-data-1-and-2-mean) (analytics vs form context).

[^hubspot-merge]: HubSpot — merging contacts combines records; primary record drives many surviving field values — confirm in UI property history after merges ([HubSpot knowledge base: merge contacts](https://knowledge.hubspot.com/contacts/merge-contacts)).

[^gclid-diagnostics]: Google Ads — `gclid` identifies a **click** on an ad for measurement; it does not by itself define your internal **leadsource** taxonomy for every business process ([Google Ads click identifiers](https://support.google.com/google-ads/answer/6305348)).

[^multi-touch]: Industry practice: **first-touch** vs **last-touch** attribution; mismatches between session analytics and form UTMs are common when users use multiple tabs or return later ([HubSpot attribution overview](https://knowledge.hubspot.com/reports/attribution-reports)).

[^lead-source-practices]: HubSpot community — **Lead Source** as business taxonomy vs syncing from native sources ([Community discussion](https://community.hubspot.com/t5/CRM/Lead-Source-property-best-practices/m-p/1008713)).
