# Documentation scale and scope

**Last Updated:** 2026-04-01

This explains why the repository **looks** enormous, and what is actually “documentation debt” versus **intentional data volume**.

## Rough file counts (workspace, not all tracked)

From the repo root (includes gitignored trees on disk):

| Area | Approx. files | Notes |
|------|----------------|--------|
| `node_modules/` | ~23k | Dependencies — not Ordio-authored docs |
| `v2/` | ~19k | App, assets, blog JSON SSOT under `v2/data/`, images, scripts |
| `docs/` on disk | ~49k | Includes **`docs/backups/` (~31k)** — **gitignored** blog snapshots |
| `docs/` **tracked in Git** | ~14.2k | See `git ls-files docs \| wc -l` after pull |

So: **hundreds of thousands of paths** are normal for a JS + PHP + content monorepo. Meta-audit markdown is a **tiny** fraction.

## Where tracked `docs/` volume really is

Roughly **12k+** tracked paths under **`docs/content/blog/`** alone — mostly **per-post pipeline JSON**, outlines, research, and companion markdown. That is the **content operating system**, not “random reports.”

Other sizeable buckets:

- `docs/content/helpdesk/` — large extract / mapping artifact set from a past integration project (candidate for a future consolidation pass if you decide it is no longer read).
- `docs/systems/`, `docs/guides/` — legitimate long-lived references.

## What we optimize for in cleanup

1. **Few entry files at `docs/` root** — pointers + standards + generated inventories (see [DOCUMENTATION_HYGIENE_LOG.md](DOCUMENTATION_HYGIENE_LOG.md)).
2. **No duplicate generated link reports** — one `validate-links.py` output ([LINK_VALIDATION_REPORT.md](LINK_VALIDATION_REPORT.md)); old splits live under [archive/2026-04-01-docs-cleanup/link-validation-2026-01/](archive/2026-04-01-docs-cleanup/link-validation-2026-01/).
3. **Completed analysis packs** move to `docs/archive/…` with a README (e.g. UTM cleanup analyses).
4. **Do not conflate** “delete JSON in blog pipeline” with “documentation hygiene” — that is a **product/content** decision with migration risk.

## If the goal is fewer tracked files overall

That requires **policy changes**, for example:

- Stop committing certain large JSON exports (use gitignore + CI artifacts) — needs owner agreement per surface.
- Prune `docs/content/helpdesk/` after confirming no legal/compliance need — **human decision**.
- Keep `docs/seo-strategy-2026/` ignored locally as today (see `.gitignore`) unless you explicitly want it in Git.

## Related

- [DOCUMENTATION_HYGIENE_LOG.md](DOCUMENTATION_HYGIENE_LOG.md)
- [archive/2026-04-01-docs-cleanup/README.md](archive/2026-04-01-docs-cleanup/README.md)
