# Ahrefs Internal Link Opportunities - Process Documentation **Last Updated:** 2026-01-27 ## Quick Start (For Recurring Imports) **For future Ahrefs CSV imports, use the automated workflow:** ```bash python3 v2/scripts/blog/process-ahrefs-csv.py /path/to/ahrefs-export.csv ``` See `docs/seo/AHREFS_RECURRING_WORKFLOW.md` for complete recurring workflow guide. ## Overview This document outlines the complete process for analyzing, filtering, and implementing internal link opportunities from Ahrefs CSV exports. This process ensures high-quality, contextual internal links that improve SEO while maintaining natural content flow. **Enhanced Version:** This documentation covers the improved filtering system with enhanced priority scoring, context quality assessment, topical relevance scoring, and automated classification. **Automated Workflow:** The process is now fully automated via `process-ahrefs-csv.py` script. See Quick Start above. ## Process Workflow ### Phase 1: Data Import and Initial Analysis **Input:** Ahrefs CSV export (`ordio_*.csv`) **Scripts:** - `v2/scripts/blog/analyze-ahrefs-opportunities.py` - Original analysis script - `v2/scripts/blog/analyze-ahrefs-opportunities-enhanced.py` - Enhanced analysis with comprehensive statistics - `v2/scripts/blog/analyze-ahrefs-opportunities-jan-2026.py` - January 2026 CSV analysis script (UTF-16 LE handling) - `v2/scripts/blog/audit-existing-links.py` - Audit existing internal links **Steps:** 1. **Parse CSV file** - Handle UTF-16 LE encoding - Tab-separated values - Extract: Source page, Keyword, Keyword context, Target page, PR, Traffic data 2. **Normalize URLs** - Remove protocol and domain - Remove trailing slashes - Remove query parameters and fragments - Standardize for comparison 3. **Validate opportunities** - Check source page is not `noindex` - Verify target page exists - Check if link already exists - Identify blog post JSON files 4. **Calculate priority score** - Source PR (weight: 0.3) - Source traffic (weight: 0.2) - Keyword search volume (weight: 0.2) - Target traffic (weight: 0.3) **Output:** `ahrefs-analysis/full-analysis.json`, `valid-opportunities.json`, `prioritized-opportunities.json` ### Phase 2: Quality Filtering (Enhanced) **Scripts:** - `v2/scripts/blog/filter-ahrefs-opportunities.py` - Original filtering script - `v2/scripts/blog/filter-ahrefs-opportunities-enhanced.py` - Enhanced filtering with all improvements **Enhanced Filter Criteria:** 1. **Enhanced Priority Scoring** - Source PR (30%) - Source URL Rating (20%) - Source traffic (10%, capped at 1000) - Keyword search volume (15%, capped at 50K) - Keyword difficulty (inverse, 10%) - Target traffic potential (10%, capped at 1000) - Content placement bonus (5% for first 30% of content) - Pillar/cluster boost multiplier (1.5x for pillar pages) - Target ranking position factor (boost for top 10/50) 2. **Enhanced Context Quality Assessment** - Minimum context length: 50 characters - Keyword must appear in context - **Protected Areas (Never Add Links):** - Headers (h1-h6) - Links in headers appear spammy and don't improve SEO - FAQ questions - Questions are structural elements, not content - Related content carousel - Already has links (check for duplicates) - Script/style tags - Already protected - HTML tag attributes - Already protected - Existing links - Already checked - **Safe Areas (Can Add Links):** - Paragraphs (`

`) in `content.html` with natural context - List items (`

`) in `content.html`:** - Natural paragraph content with sufficient context (minimum 20 chars) - Must be actual paragraph tag, not header/list fragment - Detection: `is_in_safe_paragraph()` function **List Items (`

`) in `content.html`:** - List items with sufficient context (minimum 15 chars) - Must have natural sentence context, not just keyword - Detection: `is_in_safe_paragraph()` function **FAQ Answers (`faqs[].answer`):** - FAQ answers are HTML content separate from questions - Answers can contain contextual links - Processing: `process_faq_answer_link()` function - Structure preserved: question/answer separation maintained **Table Cells (``):** - Only if natural sentence context (rare) - Must have sufficient text (minimum 20 chars) ### Best Practices **Always Add:** - Pillar page links (even if in carousel) - High PR (35+) to high-traffic targets - Contextual mentions in paragraphs **Review Before Adding:** - Links to posts already in carousel - Links close to headers (<50 chars) - Links in list items without paragraph context **Never Add:** - Links in headers (h1-h6) - Links in FAQ questions - Links in carousel component HTML - Duplicate links with same anchor text - Links too close to headers (<50 chars) - Links too close to other links (<200 chars) ## Common Issues and Solutions ### Issue: Keyword not found **Cause:** Keyword form mismatch (plural vs singular, capitalization) **Solution:** Script uses German word boundary detection and preserves original word form. Check if keyword appears in different form in content. ### Issue: Link already exists **Cause:** Link was added previously or exists in different form **Solution:** Script checks for existing links before adding. Review `internal_links` array in JSON file. ### Issue: Link density exceeded **Cause:** Page already has 20+ internal links **Solution:** Prioritize highest-value opportunities. Consider removing lower-value existing links if needed. ### Issue: Link skipped - Keyword in header **Cause:** Keyword found in header tag (h1-h6) **Solution:** This is expected behavior. Headers should never contain links. Check if keyword appears elsewhere in content (paragraphs, FAQ answers). ### Issue: Link skipped - Keyword in FAQ question **Cause:** Keyword found in FAQ question field **Solution:** This is expected behavior. FAQ questions should never contain links. Check if keyword appears in FAQ answer instead. ### Issue: Link skipped - Target in carousel **Cause:** Target URL already in `related_posts` carousel **Solution:** Check decision logic. Link will be added if high-value (pillar, high PR+volume, different anchor). Otherwise, carousel link is sufficient. ### Issue: Link skipped - Too close to header **Cause:** Keyword position is less than 50 characters from header **Solution:** This is expected behavior. Links should maintain minimum distance from headers for natural flow. Check if keyword appears elsewhere with more distance. ### Issue: HTML validation errors **Cause:** Pre-existing HTML issues (external links without nofollow, etc.) **Solution:** These are false positives if they relate to existing external links. Focus on validating newly added internal links. ## Best Practices 1. **Always run dry-run first** before implementing links 2. **Review filtered opportunities** before implementation 3. **Test link functionality** after implementation 4. **Validate HTML structure** to ensure no broken tags 5. **Monitor link density** to stay within limits 6. **Preserve natural content flow** - links should enhance, not disrupt readability 7. **Use varied anchor text** - avoid exact match keyword stuffing 8. **Support pillar-cluster model** - link from detailed posts to pillar pages 9. **Maintain topical relevance** - only link related content 10. **Document reasoning** - include "Ahrefs opportunity" in link metadata ## Future Improvements - [ ] Automated link health monitoring (broken links, 404s) - [ ] Performance tracking (click-through rates, engagement) - [ ] Periodic review of link quality - [ ] Integration with SEO dashboard for metrics - [ ] Automated link density alerts - [ ] Anchor text variation analysis ## Recent Implementation (January 2026) **CSV Source:** `ordio_24-jan-2026_link-opportunities_2026-01-27_07-39-05.csv` **Results:** - Total opportunities analyzed: 43 - Approved & implemented: 8 (18.6%) - Rejected: 35 (81.4%) - 17 already linked - 13 unsafe placement - 3 source pages not found (pillar pages) - 2 keywords in FAQ questions **Implementation Report:** `v2/scripts/blog/ahrefs-analysis/implementation-report-jan-2026.md` **Key Learnings:** 1. Many opportunities rejected due to existing links (good - shows content already well-linked) 2. Pillar pages (`/insights/dienstplan`, `/insights/zeiterfassung`) are not blog posts - handle separately 3. Keywords in headers are common - consider expanding content or finding alternative placements 4. All implemented links validated for safe placement, duplicates, and SEO quality ## Related Documentation - `docs/seo/ahrefs-link-opportunities-implementation-report.md` - Implementation report template - `v2/scripts/blog/ahrefs-analysis/implementation-report-jan-2026.md` - January 2026 implementation report - `v2/scripts/blog/link_utils.py` - Utility functions for link operations - `.cursor/rules/shared-patterns.mdc` - Universal validation checklist - `docs/content/blog/SAFE_LINK_PLACEMENT_GUIDE.md` - Safe link placement guidelines