# Ahrefs CSV Processing Workflow - Quick Start Guide

**Last Updated:** 2026-01-27

## Overview

This guide provides a streamlined workflow for processing Ahrefs internal linking CSV exports. The process is fully automated and can be run with a single command.

## Quick Start

### Option 1: Python Script (Recommended)

```bash
cd /Users/hadyelhady/Documents/GitHub/landingpage
python3 v2/scripts/blog/process-ahrefs-csv.py /path/to/ahrefs-export.csv
```

**With auto-approve (skip confirmation):**
```bash
python3 v2/scripts/blog/process-ahrefs-csv.py /path/to/ahrefs-export.csv --auto-approve
```

**Dry-run only (no implementation):**
```bash
python3 v2/scripts/blog/process-ahrefs-csv.py /path/to/ahrefs-export.csv --dry-run-only
```

### Option 2: Shell Script

```bash
cd /Users/hadyelhady/Documents/GitHub/landingpage
./v2/scripts/blog/process-ahrefs-csv.sh /path/to/ahrefs-export.csv
```

## What the Workflow Does

The automated workflow performs these steps:

1. **Analysis** - Parses CSV, validates opportunities, checks duplicates
2. **Classification** - Classifies each opportunity as APPROVED, REVIEW, or REJECTED
3. **Filtering** - Creates filtered opportunities file with only approved links
4. **Dry-Run** - Tests implementation without modifying files
5. **Implementation** - Adds approved links to blog posts (with confirmation)

## Workflow Steps Explained

### Step 1: Analysis

The analysis script (`analyze-ahrefs-opportunities-jan-2026.py`) performs:

- ✅ Parses CSV with UTF-16 LE encoding
- ✅ Normalizes URLs for comparison
- ✅ Checks for existing links (content.html and FAQ answers)
- ✅ Checks carousel duplicates
- ✅ Validates safe placement (not in headers, FAQ questions, etc.)
- ✅ Validates target pages exist
- ✅ Checks SEO quality (anchor text, German word boundaries)
- ✅ Classifies opportunities (APPROVED/REJECTED/REVIEW)

**Output:**
- `analysis-report-{timestamp}.md` - Human-readable report
- `analysis-results-{timestamp}.json` - Detailed JSON results

### Step 2: Filtering

Creates `filtered-opportunities-enhanced.json` with only approved opportunities, including:
- Source file paths
- Normalized URLs
- Keyword context
- All validation metadata

### Step 3: Dry-Run

Tests implementation without modifying files:
- Shows what links would be added
- Validates safe placement
- Checks for conflicts
- No files are modified

### Step 4: Implementation

If approved, implements links:
- Creates backups before modification
- Adds links to content.html
- Updates internal_links arrays
- Logs all changes

## Output Files

All files are saved in `v2/scripts/blog/ahrefs-analysis/`:

### Analysis Files
- `analysis-report-{timestamp}.md` - Complete analysis report
- `analysis-results-{timestamp}.json` - Detailed JSON results
- `filtered-opportunities-enhanced.json` - Approved opportunities only

### Implementation Files
- `implementation-results-enhanced.json` - Implementation results
- `implementation-results-enhanced-dry-run.json` - Dry-run results
- `logs/implementation-{timestamp}.log` - Detailed log

### Backups
- `backups/{slug}-{timestamp}.json` - Backup of each modified file

## Understanding the Results

### Classification Meanings

**APPROVED:**
- Meets all safety rules
- No duplicate links
- Safe placement (not in headers, FAQ questions)
- Target page exists
- Good SEO quality

**REJECTED:**
- Link already exists
- Unsafe placement (header, FAQ question, etc.)
- Source page not found
- Low SEO quality
- Target page doesn't exist

**REVIEW:**
- Edge cases needing manual review
- Target page verification needed
- Close calls on placement

### Common Rejection Reasons

1. **"Link already exists"** - Good! Content is already well-linked
2. **"Unsafe placement"** - Keyword only in headers or existing links
3. **"Source page not found"** - Pillar pages (not blog posts) or invalid URLs
4. **"Keyword in FAQ question"** - Links belong in answers, not questions

## Safety Features

### Automatic Protections

- ✅ Never adds links to headers (h1-h6)
- ✅ Never adds links to FAQ questions
- ✅ Checks for existing links before adding
- ✅ Validates minimum distance from headers (50+ chars)
- ✅ Validates minimum distance from other links (200+ chars)
- ✅ Respects German word boundaries
- ✅ Creates backups before modifications
- ✅ Validates HTML structure

### Manual Review Triggers

The workflow will flag opportunities for review if:
- Target page existence needs verification
- Edge cases in placement
- High-value opportunities that need confirmation

## Best Practices

### Before Running

1. **Backup your data** (automatic backups are created, but good to have extra)
2. **Review CSV file** - Check it's the latest export from Ahrefs
3. **Check file encoding** - Should be UTF-16 LE (handled automatically)

### After Running

1. **Review analysis report** - Understand why opportunities were rejected
2. **Check dry-run results** - Verify links look correct
3. **Validate on staging** - Test links before production
4. **Monitor performance** - Track link performance in Google Search Console

### For Future Imports

1. **Use the workflow script** - Don't run scripts manually
2. **Review rejection reasons** - Learn what's already linked
3. **Track patterns** - Note common rejection reasons
4. **Update content** - Consider expanding content if many opportunities rejected due to headers

## Troubleshooting

### CSV Encoding Issues

If CSV parsing fails:
```bash
# Check file encoding
file /path/to/ahrefs-export.csv

# Should show: UTF-16, little-endian
```

### Analysis Script Not Found

Make sure you're in the project root:
```bash
cd /Users/hadyelhady/Documents/GitHub/landingpage
```

### No Approved Opportunities

This is normal if:
- Content is already well-linked
- Many opportunities are duplicates
- Keywords are in headers (not linkable)

Review the analysis report to understand rejections.

### Implementation Fails

Check:
1. File permissions (should be writable)
2. JSON syntax (should be valid)
3. Backup directory exists
4. Log file for detailed errors

## Example Usage

```bash
# Basic usage
python3 v2/scripts/blog/process-ahrefs-csv.py \
  ~/Desktop/Ordio/Internal\ Linking/2026/1.\ January/ordio_24-jan-2026_link-opportunities.csv

# Auto-approve (for trusted imports)
python3 v2/scripts/blog/process-ahrefs-csv.py \
  ~/Desktop/Ordio/Internal\ Linking/2026/1.\ January/ordio_24-jan-2026_link-opportunities.csv \
  --auto-approve

# Dry-run only (review without implementing)
python3 v2/scripts/blog/process-ahrefs-csv.py \
  ~/Desktop/Ordio/Internal\ Linking/2026/1.\ January/ordio_24-jan-2026_link-opportunities.csv \
  --dry-run-only
```

## Related Documentation

- `docs/seo/ahrefs-link-opportunities-process.md` - Detailed process documentation
- `docs/content/blog/SAFE_LINK_PLACEMENT_GUIDE.md` - Safe link placement rules
- `docs/content/blog/guides/INTERNAL_LINKING_GUIDE.md` - Internal linking best practices

## Support

For issues or questions:
1. Check the analysis report for detailed reasons
2. Review implementation logs in `ahrefs-analysis/logs/`
3. Check backups if rollback needed
4. Review rejection reasons to understand patterns
