AI Model Pricing Comparison 2026: Cost per Million Tokens

Complete comparison of API token pricing (input, output, caching) across OpenAI, Anthropic Claude, Google Gemini, DeepSeek, and open-weights. Sourced directly from official docs.

· Figures verified against official sources, June 6, 2026

API pricing for large language models has commoditized rapidly, but the pricing structures have become more complex. Between input, output, prompt caching, batch execution, and self-hosted instances, developers must calculate pricing carefully to estimate their monthly workloads. Below is our dynamic pricing comparison table, kept in exact sync with the main model index.

Model Provider Input / 1M Output / 1M Cached Input / 1M
Loading pricing data...

Pricing Tiers: Frontier vs. Mid-Tier vs. Small

When analyzing costs, models generally fall into three tiers:

  • Frontier Tier ($5.00+ Input / $15.00+ Output): Reserved for absolute top-tier intelligence like Claude Opus 4.8 and GPT-5.5. These models are ideal for complex architectural decisions and high-stakes agent loops, but are too expensive for daily high-volume tasks.
  • Mid-Tier ($1.25 - $3.00 Input / $2.50 - $15.00 Output): Models like Claude Sonnet 4.6, GPT-5, and Gemini 3.5 Flash represent the "sweet spot" for production applications, blending high capability with moderate pricing.
  • Small & Open Coder Tier (Sub-$1.00 Input): Models like DeepSeek V4-Flash, GPT-5 Mini, and self-hosted Llama 4 Scout. These provide fast responses and negligible costs, making them perfect for routing, classification, or high-volume summarization.

For more details on overall performance rankings, see our AI Model Rankings or compute your exact monthly cost with the Cost Calculator.

Frequently asked

How does Prompt Caching save money?

Prompt caching allows API providers (such as Anthropic, OpenAI, and DeepSeek) to reuse parts of the context window that remain static (like large system prompts or system contexts). This results in discounts of up to 90% off standard input token rates, which is crucial for agentic architectures.

Are self-hosted open-weight models completely free?

While models like Llama 4 Scout, Maverick, or Phi-4 cost $0 to license under open licenses, you must pay for the cloud GPU infrastructure (such as AWS, GCP, RunPod, or Lambda Labs) to host them. If your throughput is low, managed APIs are often cheaper than keeping a GPU active 24/7.