Skip to content

Extended Thinking

Controlling your AI coding agent's reasoning depth with effort levels and extended thinking.

Extended thinking is a built-in model capability where the model reasons internally before responding. It is enabled by default on the most capable and standard models. You do not need to prompt for it — you control it through effort levels.

How effort levels work

Effort levels control how much thinking the model allocates to each response. Lower effort is faster and cheaper. Higher effort provides deeper reasoning at the cost of speed and tokens.

LevelBehavior
lowMinimal thinking. Fast responses for straightforward tasks.
mediumDefault. Balanced reasoning depth for most coding work.
highDeep reasoning. Best for complex debugging and architecture.
maxDeepest reasoning with no token constraint. Most capable model only.

Medium is the default on both the most capable and standard models. This is the recommended level for most coding tasks — higher levels can cause the model to overthink routine work.

Setting effort

Interactive command:

/effort high

Run /effort low, /effort medium, /effort high, or /effort max to change the level for the current session. Run /effort auto to reset to the model default.

Keyboard shortcut:

Press Option+T (macOS) or Alt+T (other platforms) to toggle extended thinking on or off. This may require terminal configuration to enable Option key shortcuts (iTerm2: Profiles → Keys → set Option key to “Esc+”).

At startup:

agent --effort high

Pass the effort level when launching your AI coding agent. This sets it for that session only.

Environment variable:

export AGENT_EFFORT_LEVEL=high

This takes precedence over all other methods. Valid values: low, medium, high, max, auto.

In the model picker:

Use left/right arrow keys to adjust the effort slider when selecting a model with /model. The current effort level is displayed next to the logo and spinner so you can confirm the active setting.

One-off deep reasoning with “ultrathink”

Include the word “ultrathink” in your prompt to trigger high effort for a single turn without changing your session setting:

ultrathink about the race condition in the connection pool.
What happens when two goroutines call Close() simultaneously?

This is useful when you are generally working at medium effort but hit a problem that needs deeper reasoning.

Viewing thinking output

By default, the model’s internal thinking is not displayed. Press Ctrl+O to toggle verbose mode, which shows detailed tool usage and thinking output. You can also launch with --verbose for the same effect.

When higher effort helps

  • Complex architecture decisions: Evaluating tradeoffs across multiple components and their interactions
  • Multi-step refactoring: Changes that ripple across many files where ordering and side effects matter
  • Tricky debugging: Tracing race conditions, concurrency bugs, or subtle logic errors through multiple layers
  • Security review: Reasoning about attack vectors, edge cases, and interactions between trust boundaries
  • Algorithm design: Problems where the first intuitive approach is likely wrong or suboptimal

When it is overkill

  • Simple boilerplate and CRUD operations
  • Renaming variables, fixing typos, adding log lines
  • Tasks with one obvious approach
  • Quick lookups, explanations, or codebase questions

For these, medium or low effort produces the same result faster and cheaper.

Common misconception: prompt-based thinking

Phrases like “think step by step” or “think carefully” do not allocate additional thinking tokens. They are regular prompt text. To actually increase reasoning depth, use /effort or include “ultrathink” in your prompt.

Programmatic control

For CI/CD pipelines and scripts, two environment variables give direct control:

  • AGENT_EFFORT_LEVEL — Sets the effort level. Takes precedence over /effort and the effortLevel setting.
  • MAX_THINKING_TOKENS — Overrides the thinking token budget directly. On models with adaptive reasoning, this is ignored unless you also set AGENT_DISABLE_ADAPTIVE_THINKING=1 to revert to fixed budgets. Set to 0 to disable thinking entirely.

Practical examples

Default workflow — leave it at medium:

Most of the time, you do not need to change anything. Medium effort handles everyday coding, test writing, and code review well.

Bump effort for a hard problem:

/effort high

The payment webhook handler sometimes processes the same event twice.
Trace through WebhookController, PaymentService, and IdempotencyStore
to find where the deduplication logic fails under concurrent requests.

Drop effort for bulk work:

/effort low

Add JSDoc comments to every exported function in src/utils/.

One-off deep dive without changing session effort:

ultrathink about whether this migration can run without downtime.
Consider what happens to in-flight queries during the ALTER TABLE.

Tips

  • Start at medium. Only increase effort when you notice the agent missing subtleties or making shallow analysis.
  • Use /effort low for repetitive, mechanical tasks to save tokens and time.
  • The max level is reserved for the most capable model and does not persist across sessions unless set via AGENT_EFFORT_LEVEL.
  • Effort can be set per-skill or per-subagent using the effort field in frontmatter, useful for giving a security review skill higher effort than a formatting skill.