# seo-strategy-2026 Full Instructions

## Overview

This rule file provides guidance for working with the SEO Strategy 2026 data collection, analysis, and strategy development system.

## Directory Structure

All SEO strategy work is located in `docs/seo-strategy-2026/`:

- **`data/`** - Raw collected data (gitignored, sensitive)
- **`scripts/`** - Data collection and analysis scripts
- **`guides/`** - Process documentation and guides
- **`analysis/`** - Analysis reports and findings
- **`competitive/`** - Competitive intelligence
- **`keywords/`** - Keyword research data
- **`content/`** - Content audit data
- **`technical/`** - Technical SEO audit data
- **`conversions/`** - Conversion analysis
- **`strategy/`** - Final strategy documents

## Data Collection Scripts

### Running Scripts

All scripts are in `docs/seo-strategy-2026/scripts/`:

```bash
# Test API access
php docs/seo-strategy-2026/scripts/test-api-access.php

# Content audit
php docs/seo-strategy-2026/scripts/content-audit-crawler.php --base-url=https://www.ordio.com

# Technical SEO audit
php docs/seo-strategy-2026/scripts/technical-seo-audit.php --base-url=https://www.ordio.com

# GA4 data collection (needs Property ID)
php docs/seo-strategy-2026/scripts/fetch-ga4-data.php --property-id=YOUR_ID

# Search Console data collection (needs site URL)
php docs/seo-strategy-2026/scripts/fetch-search-console-data.php --site-url=sc_domain:ordio.com
```

### Script Help

All scripts support `--help` flag:

```bash
php script-name.php --help
```

## Analysis Scripts

### Running Analysis

Analysis scripts process collected data:

```bash
# Performance trends
php docs/seo-strategy-2026/scripts/analyze-performance-trends.php

# Keyword opportunities
php docs/seo-strategy-2026/scripts/score-keyword-opportunities.php

# Content gaps
php docs/seo-strategy-2026/scripts/analyze-content-gaps.php

# Conversion funnel
php docs/seo-strategy-2026/scripts/analyze-conversion-funnel.php

# Opportunity identification
php docs/seo-strategy-2026/scripts/identify-opportunities.php
```

## Performance Optimization

### Standardized Resource Limits

All scripts now have standardized memory and time limits:

- **Collection Scripts** (fetch-, collect-, crawl-, parse-): 600s timeout, 512M memory
- **Analysis Scripts** (analyze-, score-, identify-, extract-): 300s timeout, 512M memory
- **Generation Scripts** (generate-, create-, synthesize-): 300s timeout, 256M memory

See `OPTIMIZATION_GUIDE.md` for complete optimization patterns.

### Progress Tracking

Long-running scripts (>300s) should implement checkpoint/resume using `lib/progress-persistence.php`. See `scripts/add-progress-tracking.php` for analysis of which scripts need tracking.

## HubSpot Data Collection Optimization

### Activities Collection Best Practices

**CRITICAL**: Always check what's already collected before making API calls.

1. **Property-Based Extraction First**:

   - Most activity data is available in contact properties (instant, no API calls)
   - Use `hs_analytics_num_visits`, `num_conversion_events`, `hs_analytics_num_event_completions`
   - Extract from properties: ~60,000x faster than API calls

2. **Avoid Redundant Collection**:

   - Page views: Already collected by `fetch-hubspot-page-views.php` via `hs_analytics_num_page_views`
   - Email opt-out: Already collected by `fetch-hubspot-email-activity.php` via `hs_email_optout`
   - Form submissions: Already collected by `fetch-hubspot-data.php` via Forms API

3. **Rate Limit Tracking**:

   - Always implement rate limit tracking (190 req/10s burst limit)
   - Use same pattern as `fetch-hubspot-data.php`
   - Track requests per 10-second window
   - Automatically wait when approaching limit

4. **Connection Timeouts**:

   - Always set `CURLOPT_CONNECTTIMEOUT` (10s) and `CURLOPT_TIMEOUT` (30s)
   - Prevents hanging on connection issues

5. **Performance Optimization**:
   - Property-based extraction: 0.01s for 993 contacts
   - API-based extraction: 595.8s (9.9 minutes) for 993 contacts
   - Always prefer properties over API calls when possible

### Activities Collection Script Pattern

```php
// Extract activities from contact properties (fast, no API calls)
function extractActivitiesFromProperties($contact, $startDate, $endDate) {
    $activities = [];
    $props = $contact['properties'] ?? [];

    // Visits from properties
    if (intval($props['hs_analytics_num_visits'] ?? 0) > 0) {
        $activities[] = [
            'activity_type' => 'VISIT',
            'timestamp' => strtotime($props['hs_analytics_first_visit_timestamp']) * 1000,
            'source' => $props['hs_analytics_source'] ?? null,
            'data_source' => 'property'
        ];
    }

    // Conversions from properties
    if (intval($props['num_conversion_events'] ?? 0) > 0) {
        $activities[] = [
            'activity_type' => 'CONVERSION',
            'count' => intval($props['num_conversion_events']),
            'data_source' => 'property'
        ];
    }

    return $activities;
}
```

## Data Files

### Security

- All data files in `data/` directory are gitignored
- Credentials file (`v2/config/google-api-credentials.json`) is gitignored
- Never commit sensitive data

### Data Formats

All scripts output JSON files:

- Structured data format
- Human-readable (pretty-printed)
- Includes metadata (analysis date, source, etc.)

## Documentation

### Guides

Comprehensive guides in `guides/` directory:

- **`API_SETUP_GUIDE.md`** - Google API setup
- **`DATA_COLLECTION_GUIDE.md`** - How to use scripts
- **`ANALYSIS_METHODOLOGY.md`** - Analysis frameworks
- **`KEYWORD_RESEARCH_GUIDE.md`** - Keyword research process
- **`COMPETITIVE_ANALYSIS_GUIDE.md`** - Competitive analysis
- **`CONTENT_STRATEGY_GUIDE.md`** - Content planning

### Status Documents

- **`STRATEGY_STATUS.md`** - Current status and progress
- **`NEXT_STEPS.md`** - Action items and next steps
- **`FINAL_STATUS.md`** - Final implementation status

## Best Practices

### When Adding New Scripts

1. Include comprehensive help (`--help` flag)
2. Use structured logging (`ordio_log()`)
3. Output JSON format for data files
4. Include error handling
5. Document in `DATA_COLLECTION_GUIDE.md`

### When Analyzing Data

1. Use existing analysis scripts as templates
2. Follow analysis methodology in `ANALYSIS_METHODOLOGY.md`
3. Generate insights, not just data
4. Include recommendations
5. Save results in appropriate directory

### When Updating Strategy

1. Use `strategy/STRATEGY_TEMPLATE.md` as starting point
2. Reference analysis findings
3. Include specific metrics and targets
4. Create actionable recommendations
5. Define success criteria

## Common Tasks

### Collecting New Data

1. Check if script exists in `scripts/`
2. Review `DATA_COLLECTION_GUIDE.md` for usage
3. Run script with appropriate parameters
4. Verify output files created
5. Check for errors in logs

### Running Analysis

1. Ensure data files exist
2. Run appropriate analysis script
3. Review generated insights
4. Update strategy documents if needed

### Updating Documentation

1. Update relevant guide in `guides/`
2. Update `STRATEGY_STATUS.md` if status changed
3. Update `NEXT_STEPS.md` if new actions needed
4. Update date in "Last Updated" field

## Troubleshooting

### Script Errors

1. Check script help: `php script.php --help`
2. Verify data files exist
3. Check logs for detailed errors
4. Review guide documentation
5. Check API credentials if API-related

### Data Issues

1. Verify data file format (valid JSON)
2. Check data file location
3. Verify data collection completed successfully
4. Check for missing required fields

### API Issues

1. Test API access: `php test-api-access.php`
2. Verify credentials file exists
3. Check service account permissions
4. Verify APIs enabled in Google Cloud Console

## HubSpot Data Collection & Analysis

### Collection Scripts

**Comprehensive Collection (Recommended):**

```bash
php docs/seo-strategy-2026/scripts/fetch-hubspot-comprehensive-2025.php --year=2025
```

**Individual Collection:**

```bash
# Contacts
php docs/seo-strategy-2026/scripts/fetch-hubspot-data.php --year=2025

# Activities
php docs/seo-strategy-2026/scripts/fetch-hubspot-activities.php --year=2025 --contacts-file=data/hubspot/contacts-2025.json

# Page Views
php docs/seo-strategy-2026/scripts/fetch-hubspot-page-views.php --year=2025 --contacts-file=data/hubspot/contacts-2025.json

# Email Activity
php docs/seo-strategy-2026/scripts/fetch-hubspot-email-activity.php --year=2025 --contacts-file=data/hubspot/contacts-2025.json
```

### Analysis Scripts

```bash
# Demographics
php docs/seo-strategy-2026/scripts/analyze-hubspot-contact-demographics.php

# Lead Sources
php docs/seo-strategy-2026/scripts/analyze-hubspot-lead-sources.php

# Activities
php docs/seo-strategy-2026/scripts/analyze-hubspot-activities.php

# Page Views
php docs/seo-strategy-2026/scripts/analyze-hubspot-page-views.php

# Emails
php docs/seo-strategy-2026/scripts/analyze-hubspot-emails.php

# Forms
php docs/seo-strategy-2026/scripts/analyze-hubspot-forms.php

# Conversion Paths
php docs/seo-strategy-2026/scripts/analyze-hubspot-conversion-paths.php

# Lifecycle Stages
php docs/seo-strategy-2026/scripts/analyze-hubspot-lifecycle-stages.php

# Comprehensive Analysis
php docs/seo-strategy-2026/scripts/generate-hubspot-comprehensive-analysis.php
```

### HubSpot API Patterns

**Rate Limiting:**

- 100 requests per 10 seconds per app
- Use exponential backoff for 429 errors
- Implement retry logic with `makeHubSpotRequest()` function

**Pagination:**

- Use `limit=100` (maximum per request)
- Use `after` parameter from `paging.next.after` for next page
- Continue until `paging.next` is null

**Date Filtering:**

- Use Search API for efficient date filtering: `POST /crm/v3/objects/contacts/search`
- Filter by `createdate` property with GTE/LTE operators
- Convert dates to milliseconds for HubSpot API

**Property Collection:**

- Specify comprehensive property list in `properties` parameter
- Include: lifecycle_stage, sign_up_type, UTMs, engagement metrics
- See `analysis/HUBSPOT_API_RESEARCH.md` for complete property list

**Error Handling:**

- Always use `ordio_log()` for structured logging
- Handle 429 (rate limit) with exponential backoff
- Handle 404 (not found) gracefully
- Never expose internal errors to users

### HubSpot Data Structure

**Contacts:**

- Properties include: email, lifecycle_stage, hs_analytics_source, sign_up_type, UTMs
- Engagement metrics: hs_analytics_num_page_views, hs_analytics_num_visits
- Email properties: hs_email_optout, hs_email_optout_reason

**Form Submissions:**

- Linked to contacts via `contact_id`
- Include UTM data in `utm_data` object
- Form fields in `form_fields` array

**Activities:**

- Page views, email opens/clicks, form submissions
- Linked to contacts via `contact_id`
- Timestamps in milliseconds

### HubSpot Analysis Patterns

**Lead Source Analysis:**

- Analyze `hs_analytics_source` for source distribution
- Parse UTM parameters (utm_source, utm_medium, utm_campaign)
- Map to keywords via `hs_analytics_source_data_2`
- Calculate conversion rates by source

**Lifecycle Analysis:**

- Track `lifecycle_stage` progression
- Calculate time in each stage
- Analyze conversion rates by stage
- Identify transition patterns

**Conversion Path Analysis:**

- Build touchpoint sequences from form submissions and activities
- Calculate time-to-conversion
- Analyze multi-touch attribution
- Map conversion paths

## Related Rules

- See `.cursor/rules/global.mdc` for general project rules
- See `.cursor/rules/shared-patterns.mdc` for content guidelines
- See `.cursor/rules/performance.mdc` for performance considerations

## Quick Reference

**Main Directory:** `docs/seo-strategy-2026/`

**Key Files:**

- Status: `STRATEGY_STATUS.md`
- Next Steps: `NEXT_STEPS.md`
- Guides: `guides/` directory
- Scripts: `scripts/` directory

**Configuration:**

- API Credentials: `v2/config/google-api-credentials.json` (gitignored)
- API Config: `v2/config/google-api-credentials.php`
