# Lead Capture Duplicate Contact Prevention

**Last Updated:** 2026-03-29

## Overview

This document describes the root cause of duplicate contacts in the lead capture two-step flow, the fix implemented, and how to manually merge duplicates in HubSpot.

## Root Cause (Fixed 2026-02-24)

### What Happened (Martin Klima Case)

1. **Contact A** – Created in Step 1 with temp email `lead-{leadId}@temp.ordio.com`
2. **Contact B** – Pre-existing contact (same phone, real email)
3. **Step 2** – User submitted email + callback notes
4. **Bug** – Step 2 used `checkExistingContact($phone, $tempEmail)` which uses HubSpot search with **OR logic** (phone OR email). When multiple contacts matched, the code used `results[0]`. HubSpot's result order was unspecified, so the pre-existing contact was sometimes returned first and updated instead of the Step 1 contact.

### Technical Details

- `checkExistingContact()` used `filterGroups` with two groups: phone and email
- HubSpot CRM Search: multiple filter groups = OR logic
- Code took `$response['results'][0]` – arbitrary first result
- When user had pre-existing contact with same phone, wrong contact was updated

## Fix Implemented

### Step 2 Contact Lookup Order

1. **Search by temp email ONLY first** – `findContactByTempEmail($tempEmail)`
   - Unambiguous: only the Step 1 contact has this exact email
   - If found → update that contact

2. **Fallback: search by phone with temp email preference** – `findContactByPhonePreferringTempEmail($phone, $tempEmail)`
   - Used when temp email search returns nothing (e.g. CRM API fallback in Step 1)
   - Returns all contacts with matching phone, sorted by `createdate` DESC
   - Prefers contact with exact temp email match
   - Else prefers contact with `lead-.*@temp.ordio.com` pattern
   - Else uses most recently created

### Functions Added

- `findContactByTempEmail($tempEmail)` – Search by email only
- `findContactByPhonePreferringTempEmail($phone, $preferTempEmail)` – Search by phone, prefer temp email

### Files Modified

- `v2/api/lead-capture.php` – Step 2 lookup logic, new helper functions

### Follow-up (2026-03-29): `updateHubSpotContact` must not overwrite resolved ID

Step 2 resolves the contact with **`findContactByTempEmail()` first**, then calls **`updateHubSpotContact(..., $contactIdToUpdate)`**. A legacy block inside `updateHubSpotContact` still ran a **phone-based CRM search** and overwrote `$contactId` with `results[0]`, undoing the temp-email fix. That path is now **skipped when `$contactIdToUpdate` is set**; the function loads `email` via a single GET for the known ID instead. See `updateHubSpotContact` in `v2/api/lead-capture.php`.

## Manual Merge in HubSpot

When duplicates already exist:

1. In HubSpot: **Contacts** → search by name
2. Open both duplicate records
3. Identify which has:
   - **Temp email** (`lead-...@temp.ordio.com`) = Step 1 duplicate
   - **Real email** = primary record (may have callback notes from Step 2)
4. **Merge:** Use **Merge** (or **Combine**). Keep the record with the most complete data (real email + callback notes). Merge the other into it.
5. **Note:** Merges are irreversible. Choose the primary record carefully.

## Diagnostic Scripts

### Investigate Duplicates

To investigate duplicate contacts:

```bash
php v2/scripts/hubspot/investigate-lead-capture-contacts.php 711086299372 338722964676
```

Or with comma-separated IDs:

```bash
php v2/scripts/hubspot/investigate-lead-capture-contacts.php --contacts=711086299372,338722964676
```

Output: side-by-side comparison, creation dates, temp email detection, merge recommendation.

### Test Merge (Manual)

To test the merge API manually (e.g. after auto-merge failed or for manual consolidation):

```bash
# Dry-run: fetch both contacts, show comparison, no merge
php v2/scripts/hubspot/test-lead-capture-merge.php 338722964676 711086299372 --dry-run

# Execute merge (IRREVERSIBLE)
php v2/scripts/hubspot/test-lead-capture-merge.php 338722964676 711086299372
```

Primary = pre-existing contact (real email, older). Secondary = Step 1 contact (temp email).

## Auto-Merge (Step 2)

When Step 2 successfully updates the Step 1 contact and detects two contacts with the same phone (Step 1 contact + pre-existing contact with real email), the system automatically merges them via HubSpot v1 merge API.

### When Merge Runs

- After `updateHubSpotContact` succeeds
- Only when: `email` is not empty AND (`notes` OR `callPreference`) is present (new data worth preserving)
- Exactly 2 contacts with same phone: one Step 1 (temp email), one pre-existing (real email)
- Skipped if 3+ contacts (ambiguous)

### Primary vs Secondary

- **Primary** = Pre-existing contact (older `createdate`, real email, company, history) – stays as merged record
- **Secondary** = Step 1 contact (just updated with notes, call preference) – merged into primary

### Data Preservation

- Before merge: PATCH primary with `description` (notes + call preference), `hs_analytics_source_data_1` (source page), `sign_up_type__c` = "Lead Capture Form"
- Merge: HubSpot combines activities/associations; primary's property values win; if primary is empty, secondary's value is used
- **Irreversible:** Merge cannot be undone

### Merge API Reference

- **Endpoint:** `POST https://api.hubapi.com/contacts/v1/contact/merge-vids/{primaryId}`
- **Body:** `{"vidToMerge": "{secondaryId}"}`
- **Scope:** `crm.objects.contacts.write` (already used for lead capture)
- **Non-blocking:** Merge failure is logged but does not fail Step 2 response

### Functions

- `findAllContactsByPhone($phone)` – Return all contacts with that phone, sorted by createdate ASC
- `mergeHubSpotContacts($primaryId, $secondaryId)` – Call HubSpot v1 merge API
- `detectAndMergeDuplicates($phone, $step1ContactId, $notes, $callPreference, $sourcePage, $correlationId)` – Orchestrates detection and merge

## HubSpot Duplicate Prevention

- HubSpot uses email as primary unique identifier for contacts
- Forms API v3: submitting with same email updates existing contact
- Our Step 1 uses temp email (unique per lead), so it always creates new contact
- Step 2 now correctly finds the Step 1 contact by temp email before updating

## Related Documentation

- [ARCHITECTURE.md](ARCHITECTURE.md) – Duplicate Contact Prevention section
- [.cursor/rules/lead-capture.mdc](../../../.cursor/rules/lead-capture.mdc) – Step 2 lookup rules
