Anthropic describes Sonnet 4.6 as "the best combination of speed and intelligence," and for once the marketing line maps cleanly onto how you should use it. This is the balanced tier. It's fast, it's cheap enough to run at volume, and it's smart enough that you rarely feel the gap to Opus on everyday work. The whole pricing structure assumes you'll live here and visit the other tiers when you need them.
The specs back the positioning, per the Claude model documentation: a 1M-token context window matching Opus 4.8, up to 64k output tokens, vision, function calling, structured output, prompt caching, and both extended and adaptive thinking. The feature set isn't a cut-down version of the flagship. You give up the very top of the reasoning curve and almost nothing else.
So the real question isn't whether Sonnet 4.6 is good. It is. The question is which tasks belong here, which should drop a tier to save money, and which should pay up for Opus. That's a routing decision, and getting it right is worth more than picking any single model.
When Sonnet is the right default
Most of what a production app does, Sonnet handles without breaking a sweat. Chat and Q&A. Drafting and rewriting. Routine coding, the ship-this-feature kind rather than the redesign-this-architecture kind. Tool calling and function dispatch, where instruction-following matters more than raw brainpower. Structured extraction where the schema does the heavy lifting. Put all of that on Sonnet and you're paying a fair rate for output you don't need to second-guess.
The reason to default here rather than to Opus is pure economics. Sonnet's input is 60% of Opus's and its output is 60% as well, so every call you route down instead of up saves you real money at scale. The common mistake is defaulting to the flagship and barely touching the middle tier, then wondering why the bill climbs. The price-per-use-case breakdown shows how fast that adds up across chat, RAG, and agent workloads.
| Workload | Best tier | Why |
|---|---|---|
| Bulk classification, request routing, field extraction | Haiku 4.5 | Simple, high-volume; Sonnet's reasoning goes unused |
| Chat, drafts, routine code, tool use, most production traffic | Sonnet 4.6 | The balanced default: smart enough, priced for volume |
| Complex production code, long-horizon agents, dense docs, hard reasoning | Opus 4.8 | Worth the premium only where a mistake is costly to undo |
When to drop down to Haiku
Plenty of workloads never need what Sonnet brings. If you're tagging support tickets, routing requests to the right queue, pulling fields out of a document, or summarizing in bulk, the reasoning you'd pay Sonnet for sits idle. That's Haiku 4.5 territory, at a third of Sonnet's price.
The trick is to be honest about how many of your tasks are simple. A lot of "AI features" are classification wearing a trench coat, and they run fine on the cheap tier. The Haiku 4.5 review walks through where the cheapest model holds up and where the quality drop starts to cost you more than the savings.
When to climb to Opus
Sonnet's ceiling shows up on the hard stuff. Untangling a bug that spans a dozen files. A long agent run where one early wrong turn derails everything after it. Dense legal or financial analysis where a missed detail has a price tag. That's where Opus 4.8's reasoning, and its newer honesty gains around catching its own mistakes, earn the roughly 1.67× input premium.
Escalate to the expensive model one task at a time, not one stack at a time.
The right pattern is per-task escalation. Keep Sonnet as the floor, detect the hard cases, and route just those up to Opus. The Opus 4.8 review covers what the flagship buys you and, just as usefully, the one benchmark where it still trails a rival. You don't need to commit your whole pipeline to one tier. The cheaper play is almost always a mix.
The price math, plainly
Here's the shape of it. Sonnet output costs five times its input, the same ratio Anthropic runs on every tier, so the fastest way to cut a Sonnet bill is to shorten responses, not inputs. Cached input drops to about a tenth of the standard rate, and the Batch API takes 50% off both directions for anything you don't need answered in real time. Stack caching and batching on a repetitive workload and the effective rate falls hard.
For the levers that cut a token bill regardless of tier, benchr's guide to cutting your token spend goes deeper on caching, batching, and routing. The headline for Sonnet specifically: it's the tier where those savings compound, because it's the one you should be running most of your traffic through.
The verdict
Claude Sonnet 4.6 is the model to build on. It's the right default for the overwhelming majority of what a production system asks of an LLM, and the pricing is set up to reward you for keeping it that way. The skill isn't picking Sonnet. It's knowing the handful of tasks that should drop to Haiku for the savings, and the handful that should pay up for Opus because the cost of a miss there dwarfs the model fee.
Set Sonnet as your floor, route deliberately in both directions, and you'll spend less than a team that defaults everything to the flagship while getting output that's just as good on the work that fills your days.
Frequently asked
What does Claude Sonnet 4.6 cost?
Per Anthropic's pricing page, $3 per million input tokens and $15 per million output. Cached input drops to roughly a tenth of the input rate, and the Batch API takes 50% off both directions for asynchronous jobs.
Is Claude Sonnet 4.6 good enough for production?
For most production workloads, yes. It's Anthropic's balanced tier, built for coding, tool use, and instruction-following at 60% of Opus's input price. It supports a 1M-token context window, vision, structured output, and both extended and adaptive thinking. Reserve Opus for the hardest reasoning.
When should you use Haiku 4.5 instead of Sonnet 4.6?
Drop to Haiku 4.5 for simple, high-volume tasks: tagging tickets, routing requests, and pulling fields out of forms. Haiku is a third of Sonnet's price, and on work that doesn't need Sonnet's reasoning, the quality difference disappears while the savings don't.
When is Opus 4.8 worth the jump from Sonnet 4.6?
Move up when the downside of a mistake is concrete: production code on a complex codebase, long-horizon agent runs, dense document analysis, or hard reasoning. Opus input is about 1.67× Sonnet's, so escalate per task rather than defaulting the whole stack to it.
What is the context window for Claude Sonnet 4.6?
1M tokens, the same ceiling as Opus 4.8, with up to 64k output tokens per response on the standard API. Treat the long window as a retrieval surface for pulling facts back out rather than a one-shot summarization buffer.
Changelog
- May 30, 2026 — Originally published. Pricing, context window, output limit, and feature support verified against Anthropic's pricing page and the Claude model documentation.
References
- Anthropic, "Models overview," platform.claude.com, accessed May 2026.
- Anthropic, "Pricing," platform.claude.com, accessed May 2026.
- Anthropic, "Claude Pricing," claude.com/pricing, accessed May 2026.
- Anthropic, "Prompt caching," platform.claude.com, accessed May 2026.