Which AI API model is the cheapest per million tokens?

DeepSeek V4-Flash is currently the cheapest API provider, charging $0.14 per million input tokens and $0.28 per million output tokens. For self-hosted open-weights, models like Llama 4 Scout and Phi-4 cost $0 to license, but require cloud GPU infrastructure.

What is prompt caching and does it save money?

Prompt caching allows API providers to store frequently used system instructions or prefix texts. Providers like Anthropic, OpenAI, and DeepSeek offer up to a 90% discount on cached input tokens, reducing costs drastically for long-context agentic applications.

Comparison·June 2026

AI Model Pricing Comparison 2026: Cost per Million Tokens

Complete comparison of API token pricing (input, output, caching) across OpenAI, Anthropic Claude, Google Gemini, DeepSeek, and open-weights. Sourced directly from official docs.

Reviewed June 6, 2026 · Figures verified against official sources, June 6, 2026

API pricing for large language models has commoditized rapidly, but the pricing structures have become more complex. Between input, output, prompt caching, batch execution, and self-hosted instances, developers must calculate pricing carefully to estimate their monthly workloads. Below is our dynamic pricing comparison table, kept in exact sync with the main model index.

Model	Provider	Input / 1M	Output / 1M	Cached Input / 1M
Loading pricing data...

Pricing Tiers: Frontier vs. Mid-Tier vs. Small

When analyzing costs, models generally fall into three tiers:

Frontier Tier ($5.00+ Input / $15.00+ Output): Reserved for absolute top-tier intelligence like Claude Opus 4.8 and GPT-5.5. These models are ideal for complex architectural decisions and high-stakes agent loops, but are too expensive for daily high-volume tasks.
Mid-Tier ($1.25 - $3.00 Input / $2.50 - $15.00 Output): Models like Claude Sonnet 4.6, GPT-5, and Gemini 3.5 Flash represent the "sweet spot" for production applications, blending high capability with moderate pricing.
Small & Open Coder Tier (Sub-$1.00 Input): Models like DeepSeek V4-Flash, GPT-5 Mini, and self-hosted Llama 4 Scout. These provide fast responses and negligible costs, making them perfect for routing, classification, or high-volume summarization.

For more details on overall performance rankings, see our AI Model Rankings or compute your exact monthly cost with the Cost Calculator.

Frequently asked

How does Prompt Caching save money?

Prompt caching allows API providers (such as Anthropic, OpenAI, and DeepSeek) to reuse parts of the context window that remain static (like large system prompts or system contexts). This results in discounts of up to 90% off standard input token rates, which is crucial for agentic architectures.

Are self-hosted open-weight models completely free?

While models like Llama 4 Scout, Maverick, or Phi-4 cost $0 to license under open licenses, you must pay for the cloud GPU infrastructure (such as AWS, GCP, RunPod, or Lambda Labs) to host them. If your throughput is low, managed APIs are often cheaper than keeping a GPU active 24/7.

AI Model Pricing Comparison 2026: Cost per Million Tokens

Pricing Tiers: Frontier vs. Mid-Tier vs. Small

Frequently asked

Cheapest LLM API 2026.

OpenAI API Pricing Guide.