OpenAI remains the standard reference for API pricing. Developers can choose between the flagship reasoning model (GPT-5.5), the production workhorse (GPT-5), and the lightweight model (GPT-5.5 Mini). Below is the live OpenAI pricing sheet, pulled dynamically from the benchr data store.
| Model ID | Input / 1M | Output / 1M | Cached Input / 1M | Batch Input / 1M |
|---|---|---|---|---|
| Loading OpenAI models... | ||||
Pricing Strategy: Optimizing OpenAI Costs
If your application relies heavily on OpenAI models, you can employ several tactics to significantly reduce your monthly bills:
- Utilize Batch API for Async Workloads: OpenAI offers a **50% flat discount** on all models if you queue requests via the Batch API. Requests are guaranteed to complete within 24 hours, making it perfect for model evaluation, classification, or translation.
- Enable Prompt Caching: Dynamic system prompts and repetitive contexts are cached automatically. GPT-5.5 and GPT-5 Mini feature a **50% and 90% discount** respectively for cached input hits.
- Use Structured Outputs wisely: Enforcing strict JSON schemas has zero latency overhead, but the schema definition counts towards input tokens on the first call before it is cached. Ensure you use the same schema string globally.
For cross-provider comparisons, read the global AI Model Pricing Comparison, or calculate custom usage via the Cost Calculator.