Skip to content

Model Selection & Cost

Choosing the right model, effort level, and managing costs effectively.

Most AI coding agents support multiple models with different strengths, speeds, and costs. Picking the right model for the task avoids overpaying for simple work and underperforming on complex work.

Capability tiers

TierBest forSpeedCost
Most capableComplex architecture, subtle bugs, multi-file reasoningSlowerHighest
Standard (default)Daily coding, feature work, code review, most tasksFastModerate
Fast/lightweightQuick questions, simple generation, boilerplateFastestLowest

The standard model is the default and the right choice for most work. Use the most-capable tier when you need the agent to hold more complexity in its head simultaneously. Use the fast/lightweight tier for tasks where speed matters more than depth.

Switching models

# During a session
/model standard
/model max

# At launch
agent --model standard
agent --model max

You can also press Option+P (macOS) or Alt+P (Linux/Windows) to switch models without clearing your prompt.

Effort levels

Effort controls how much reasoning the agent does before responding. Lower effort means faster, cheaper responses. Higher effort means deeper analysis.

/effort low       # Quick, surface-level responses
/effort medium    # Balanced (default for most modes)
/effort high      # Thorough analysis
/effort max       # Maximum reasoning depth
/effort auto      # Reset to the model's default effort level

When to change effort:

TaskRecommended effort
Quick lookups and explanationslow
Writing tests, feature implementationmedium
Security review, complex debugginghigh
Architecture design, multi-file reasoninghigh or max

Fast mode

/fast on     # Toggle fast mode on
/fast off    # Toggle fast mode off

Or press Option+O (macOS) / Alt+O (Linux/Windows) to toggle mid-session.

Fast mode is specific to the most-capable model tier — it trades per-token cost for speed, making iterative work roughly 2.5x faster. Use it for implementation loops where you are going back and forth rapidly — writing code, running tests, fixing errors. Turn it off when you want to save cost or need the agent to reason more carefully.

Tracking costs

Token usage in this session

/cost

Shows how many tokens you have used, the cost so far, and a breakdown by input vs. output tokens.

Plan limits and rate status

/usage

Shows your current plan’s limits and how much you have used. Useful for knowing if you are approaching rate limits.

Context window usage

/context

Visualizes how full your context window is. A full context window costs more per message because the agent processes the entire context with every response. See Sessions & Context for strategies to keep this manageable.

Budget guardrails

For automated or unattended runs, set hard limits:

# Stop after spending $5
agent -p "refactor the billing module" --max-budget-usd 5

# Stop after 10 agentic turns (prevents runaway loops)
agent -p "fix all test failures" --max-turns 10

Both flags only work in non-interactive (print/batch) mode (agent -p). They are not available in interactive sessions. They are especially important for CI/CD pipelines where a bug in the prompt could cause the agent to loop indefinitely.

Cost-saving patterns

Right-size the model

Do not use the most-capable tier for writing boilerplate. Do not use the fast/lightweight tier for architecture review. Match the model to the task.

# Quick: generate a test file (standard is fine)
agent -p "write tests for src/utils.ts" --model standard

# Deep: find a subtle concurrency bug (worth the most-capable tier)
agent -p "find race conditions in src/services/" --model max

Compact regularly

A conversation at 80% context costs roughly 4x per message compared to 20%. Use /compact to keep the window lean.

Clear between tasks

/clear is free and prevents paying to process irrelevant context from a previous task.

Use effort levels

/effort low for quick lookups saves significant tokens compared to the default. The response will be shorter and less thorough, but that is often exactly what you want.

Batch mode for CI

In CI/CD, always set --max-turns and --max-budget-usd to prevent surprise costs:

agent -p "review staged changes" \
  --max-turns 5 \
  --max-budget-usd 2 \
  --output-format json

Tips

  • Start most sessions with the standard model. Upgrade to the most-capable tier only when you notice the agent struggling with complexity.
  • /cost is instant and non-disruptive — check it periodically during long sessions.
  • Effort levels have a bigger impact on cost than most people expect. /effort low for a quick question can be 5-10x cheaper than /effort high.
  • Fast mode on the most-capable tier trades cost for speed. Use it liberally during implementation loops when speed matters more than cost.
  • --max-budget-usd is your safety net for any automated run. Set it even if you think the task is cheap.
  • When demonstrating AI coding agents to leadership, /cost at the end of a session gives concrete numbers for ROI discussions.