# Cursor token & credit efficiency (Ordio)

**Last Updated:** 2026-04-04

This guide complements [`global-guidelines.md`](global-guidelines.md) and [`cursor-playbook.md`](cursor-playbook.md). **Quality gates (validators, reviews) stay the same**—we reduce wasted context and tool rounds.

## Align with current Cursor behavior (2026)

These product directions support the same habits we already document; use them when choosing *how* to run work in Cursor:

1. **MCP on demand** — Cursor loads MCP server definitions **only when needed** ([changelog 2.4](https://cursor.com/changelog/2-4)), which reduces baseline context. Still: prefer **`make` / local scripts** when they satisfy the check ([`MCP_INTEGRATION.md`](../development/MCP_INTEGRATION.md)).
2. **Skills vs always-on rules** — [Agent Skills](https://cursor.com/docs/context/skills) are for procedural, discoverable workflows; keep **narrow `alwaysApply`** (we only use [`global.mdc`](../../.cursor/rules/global.mdc)). See [dynamic context discovery](https://cursor.com/blog/dynamic-context-discovery).
3. **Subagents** — Delegate wide search or noisy shell to **Explore / Bash** so the parent thread stays smaller ([subagents](https://cursor.com/docs/context/subagents)).
4. **Composer 2** — Cursor’s direction for a **more token-efficient** default coding model ([blog](https://cursor.com/blog/composer-2)); keep matching model tier to task ([`CURSOR_MODEL_CONFIGURATION.md`](CURSOR_MODEL_CONFIGURATION.md)).
5. **Cursor 3** — **Parallel agents** and **local ↔ cloud handoff** for long jobs can avoid one overloaded conversation ([blog](https://cursor.com/blog/cursor-3), [Agents Window](https://cursor.com/docs/agent/agents-window)).

Full link table: [`CURSOR_OFFICIAL_DOCS_REFERENCE.md`](CURSOR_OFFICIAL_DOCS_REFERENCE.md). Agent workflow habits: [Cursor — Agent best practices](https://www.cursor.com/blog/agent-best-practices).

## Principles

1. **Smallest context that answers the question** — Prefer `grep`, open one file, or `@` a single path over loading long guides whole.
2. **Route before you read** — Use the [Quick route table in `agent-workflows.md`](agent-workflows.md#quick-route-task--one-hub--rule-hint).
3. **Right-sized planning** — Match plan depth to task tier (see [`global-guidelines.md`](global-guidelines.md) §1). Validation stays mandatory; verbosity does not.
4. **Cheap checks first** — Scripts and `make` targets validate faster than multi-turn LLM self-review.

## Context: `@` references and chat scope

- Use **`@Files`** / **`@Folder`** for exactly what you will edit or review; avoid attaching the whole `docs/` tree.
- **Start a new chat** for a new, unrelated task so prior tool output and long threads are not re-billed.
- **User Rules** (Cursor Settings — not in git): keep them **minimal**. Duplicated instructions in User Rules + workspace rules + `docs/` multiply token use on every message.
  - **Checklist:** One or two lines that say “Follow project [`AGENTS.md`](../../AGENTS.md) and hub [`docs/ai/agent-workflows.md`](agent-workflows.md).” Do **not** paste bullets from [`.cursor/rules/global.mdc`](../../.cursor/rules/global.mdc) or long Ordio policy here.
  - Optional link for Cursor product behavior: [`CURSOR_OFFICIAL_DOCS_REFERENCE.md`](CURSOR_OFFICIAL_DOCS_REFERENCE.md).

## Models, Agent, and Max Mode

- **Model choice affects usage rate** on subscription plans (see [Cursor usage limits](https://cursor.com/help/models-and-usage/usage-limits)).
- **Max Mode** uses the largest context window the model supports—use for large refactors or when many files must stay in view; avoid for single-file typo fixes ([Max Mode help](https://cursor.com/help/ai-features/max-mode)).
- **Ask mode** (read-only): use for exploration and questions when you should not edit files.
- **Agent mode**: use when you need edits, terminal, or multi-step automation; scope the task tightly.

Details and IDE settings: [`CURSOR_MODEL_CONFIGURATION.md`](CURSOR_MODEL_CONFIGURATION.md).

## Why large Agent tasks burn millions of tokens (The N² Problem)

If a single Agent task consumes millions of input tokens, it is usually **not** because the codebase is too large, but because of how LLM context accumulation works.

1. **Stateless API:** Every single message in an Agent chat resends the **entire previous conversation history** (including all file reads, shell command outputs, and the Agent's internal reasoning) back to the LLM.
2. **Cumulative Cost:** If an Agent reads 5 large files (~20k tokens) and runs 20 shell commands over 25 steps to complete a task, the context window grows continuously. Turn 1 might send 50k tokens, Turn 10 sends 150k, Turn 25 sends 300k. **The sum of these inputs across a long chat session easily reaches 4-5 million tokens.**
3. **Max Mode Expansion:** If Max Mode is enabled, Cursor aggressively loads much more of the codebase into the initial context window, multiplying this cumulative cost drastically.

### How to avoid this

- **Scope Tasks Tightly:** Break massive "do everything" tasks ("Conduct a deep review and fix everything") into smaller, discrete steps.
- **Start New Chats:** When an Agent finishes a sub-task, open a new Composer/Agent chat for the next phase. This resets the accumulated context window to zero.
- **Use Ask Mode for Discovery:** If you only need to explore the codebase or understand how something works, Ask mode uses fewer tokens and avoids iterative tool-call loops.
- **Disable Max Mode for standard edits:** Only use Max Mode when a change spans dozens of coupled files and the AI strictly needs to see them all simultaneously.

## MCP and tools budget

- Follow [`.cursor/rules/mcp-usage.mdc`](../../.cursor/rules/mcp-usage.mdc) and [`docs/development/MCP_INTEGRATION.md`](../development/MCP_INTEGRATION.md).
- Prefer **local scripts and `make` targets** over repeated web fetch when either can satisfy the check.
- **Firecrawl / search / browser MCP**: use when necessary; batch related lookups instead of many single-call loops.
- **Integrated browser** (Cursor agent): for local URLs, the [built-in browser tool](https://cursor.com/docs/agent/tools/browser) is often less overhead than chaining external fetch/scrape MCPs for simple verification.

## Blog and content workflows

Efficiency without skipping validators: [`docs/content/blog/BLOG_WORKFLOW_EFFICIENCY.md`](../content/blog/BLOG_WORKFLOW_EFFICIENCY.md) (Make ladders, `blog-apply-validate` vs strict).

## Rule footprint (maintenance)

- Enumerate rules and glob overlap:  
  `python3 scripts/ai/rule-footprint.py`
- Architecture map: [`rule-hierarchy.md`](rule-hierarchy.md)
- Overlap notes: [`RULE_GLOB_OVERLAP_AUDIT.md`](RULE_GLOB_OVERLAP_AUDIT.md)

## Baseline snapshot

Historical metrics: [`CURSOR_AI_EFFICIENCY_BASELINE.md`](CURSOR_AI_EFFICIENCY_BASELINE.md)
