The standard price, and the cliff
GPT-5.5's base rates are simple: $5.00 per million input tokens, $30.00 per million output tokens, and $0.50 per million for cached input. For most chat, RAG, and short-agent workloads, that is the number you budget against.
The trap is the long-context threshold. OpenAI's pricing documentation lists higher rates once a prompt's input crosses 272K tokens: the request is billed at roughly 2× input and 1.5× output for the full session — about $10 per million input, $45 per million output, and $1 per million cached input. It applies across standard, batch, and flex.
| GPT-5.5 tier | Input / 1M | Output / 1M | Cached input / 1M |
|---|---|---|---|
| Standard (≤ 272K input) | $5.00 | $30.00 | $0.50 |
| Long context (> 272K input) | $10.00 | $45.00 | $1.00 |
| GPT-5.5 Pro (separate model) | $30.00 | $180.00 | — |
The word doing the work is session. The higher rate is not applied only to the tokens above 272K — once a request trips the threshold, the whole thing is billed at the elevated rate. A single oversized prompt can therefore double the cost of an otherwise normal call.
Why the cliff is easy to hit by accident
272K tokens sounds like a lot until you are doing agentic work. A long-running coding agent that keeps appending tool outputs, a retrieval step that stuffs too many documents into context, or a conversation that never trims its history can all drift past the line without anyone intending a "long-context" request. Because GPT-5.5 advertises a roughly 1M-token window, it is tempting to treat the full window as freely usable — but the pricing says otherwise above 272K.
The fix is boring and effective: cap context growth. Trim conversation history, retrieve fewer and better documents, and watch the input-token count on long agent loops. For a broader view of where token spend actually comes from, our AI model pricing comparison normalizes these rates across providers.
GPT-5.5 Pro is not "GPT-5.5, but better value"
The second surprise is naming. GPT-5.5 Pro is a distinct model priced at $30 per million input and $180 per million output — six times the standard input rate and six times the output rate. It targets the hardest reasoning tasks, not everyday throughput.
That price difference is large enough to change architecture decisions. At $180 output, a single verbose answer that would cost cents on standard GPT-5.5 costs meaningfully more, and a high-volume product built on Pro by default can run an order of magnitude over budget. The honest recommendation: most teams should not start on Pro. Start on standard GPT-5.5, measure where it actually falls short, and route only those specific calls to Pro.
Practical takeaways
- Treat 272K input as a budget boundary, not just a capability limit. Crossing it reprices the whole call.
- Instrument input-token counts on agent loops and RAG pipelines so you catch drift before the invoice does.
- Default to standard GPT-5.5; reserve Pro for measured exceptions. A 6× model should be a routing decision, not a default.
- Model your real mix. Per-million prices mislead until you weight them by your input/output ratio — run it through the calculator.
- For the full OpenAI lineup, the OpenAI API pricing guide covers GPT-5.5, GPT-5, GPT-5 Mini, batch, and caching together.