Most AI coding agents support multiple models with different strengths, speeds, and costs. Picking the right model for the task avoids overpaying for simple work and underperforming on complex work.
Capability tiers
| Tier | Best for | Speed | Cost |
|---|---|---|---|
| Most capable | Complex architecture, subtle bugs, multi-file reasoning | Slower | Highest |
| Standard (default) | Daily coding, feature work, code review, most tasks | Fast | Moderate |
| Fast/lightweight | Quick questions, simple generation, boilerplate | Fastest | Lowest |
The standard model is the default and the right choice for most work. Use the most-capable tier when you need the agent to hold more complexity in its head simultaneously. Use the fast/lightweight tier for tasks where speed matters more than depth.
Switching models
# During a session
/model standard
/model max
# At launch
agent --model standard
agent --model max
You can also press Option+P (macOS) or Alt+P (Linux/Windows) to switch models
without clearing your prompt.
Effort levels
Effort controls how much reasoning the agent does before responding. Lower effort means faster, cheaper responses. Higher effort means deeper analysis.
/effort low # Quick, surface-level responses
/effort medium # Balanced (default for most modes)
/effort high # Thorough analysis
/effort max # Maximum reasoning depth
/effort auto # Reset to the model's default effort level
When to change effort:
| Task | Recommended effort |
|---|---|
| Quick lookups and explanations | low |
| Writing tests, feature implementation | medium |
| Security review, complex debugging | high |
| Architecture design, multi-file reasoning | high or max |
Fast mode
/fast on # Toggle fast mode on
/fast off # Toggle fast mode off
Or press Option+O (macOS) / Alt+O (Linux/Windows) to toggle mid-session.
Fast mode is specific to the most-capable model tier — it trades per-token cost for speed, making iterative work roughly 2.5x faster. Use it for implementation loops where you are going back and forth rapidly — writing code, running tests, fixing errors. Turn it off when you want to save cost or need the agent to reason more carefully.
Tracking costs
Token usage in this session
/cost
Shows how many tokens you have used, the cost so far, and a breakdown by input vs. output tokens.
Plan limits and rate status
/usage
Shows your current plan’s limits and how much you have used. Useful for knowing if you are approaching rate limits.
Context window usage
/context
Visualizes how full your context window is. A full context window costs more per message because the agent processes the entire context with every response. See Sessions & Context for strategies to keep this manageable.
Budget guardrails
For automated or unattended runs, set hard limits:
# Stop after spending $5
agent -p "refactor the billing module" --max-budget-usd 5
# Stop after 10 agentic turns (prevents runaway loops)
agent -p "fix all test failures" --max-turns 10
Both flags only work in non-interactive (print/batch) mode (agent -p).
They are not available in interactive sessions. They are especially important
for CI/CD pipelines where a bug in the prompt could cause the agent to loop
indefinitely.
Cost-saving patterns
Right-size the model
Do not use the most-capable tier for writing boilerplate. Do not use the fast/lightweight tier for architecture review. Match the model to the task.
# Quick: generate a test file (standard is fine)
agent -p "write tests for src/utils.ts" --model standard
# Deep: find a subtle concurrency bug (worth the most-capable tier)
agent -p "find race conditions in src/services/" --model max
Compact regularly
A conversation at 80% context costs roughly 4x per message compared to 20%.
Use /compact to keep the window lean.
Clear between tasks
/clear is free and prevents paying to process irrelevant context from a
previous task.
Use effort levels
/effort low for quick lookups saves significant tokens compared to the
default. The response will be shorter and less thorough, but that is often
exactly what you want.
Batch mode for CI
In CI/CD, always set --max-turns and --max-budget-usd to prevent surprise costs:
agent -p "review staged changes" \
--max-turns 5 \
--max-budget-usd 2 \
--output-format json
Tips
- Start most sessions with the standard model. Upgrade to the most-capable tier only when you notice the agent struggling with complexity.
/costis instant and non-disruptive — check it periodically during long sessions.- Effort levels have a bigger impact on cost than most people expect.
/effort lowfor a quick question can be 5-10x cheaper than/effort high. - Fast mode on the most-capable tier trades cost for speed. Use it liberally during implementation loops when speed matters more than cost.
--max-budget-usdis your safety net for any automated run. Set it even if you think the task is cheap.- When demonstrating AI coding agents to leadership,
/costat the end of a session gives concrete numbers for ROI discussions.