benchr Issue No. 07

Recent AI model releases

Major releases over the past few months, with links to the published reference data.

The AI model landscape changes monthly. This page lists significant releases since early 2026, in reverse chronological order. For deeper coverage of each model, see the linked article or the comparison tool.

May 2026

  • Gemini 3.5 Flash — Released May 19, 2026. Beats Gemini 3.1 Pro on most coding and agentic benchmarks. $1.50 input / $9 output per million tokens. 1M context. Source: Google.

April 2026

  • Mistral Medium 3.5 — Released April 29, 2026. 128B dense, 256K context, vision built-in. Replaces Medium 3.1 + Magistral + Devstral 2 in one model. $1.50 / $7.50 per million tokens. Modified MIT license. Source: Mistral.
  • DeepSeek V4-Pro and V4-Flash — Released April 24, 2026. V4-Pro: 1.6T MoE, 49B active, 80.6% SWE-bench Verified. V4-Flash: 284B MoE, 13B active, $0.14 input. Source: DeepSeek.
  • GPT-5.5 — Released April 23, 2026. OpenAI's new flagship. $5 input / $30 output. 1M context. Strong on Terminal-Bench and FrontierMath. Source: OpenAI.
  • Claude Opus 4.7 — Released April 16, 2026. Same $5/$25 pricing as Opus 4.6, 87.6% SWE-bench Verified. New tokenizer produces up to 35% more tokens per request. Source: Anthropic.

March 2026

  • GPT-5.4 family — Released March 5, 2026 (Thinking + Pro). GPT-5.4 mini and nano followed on March 17. $2.50/$15 for the main model. Strong cost-quality balance.
  • Gemini 3 Pro deprecated — March 9, 2026. Replaced by Gemini 3.1 Pro Preview as the Pro tier. Source: Google changelog.

February 2026

  • Gemini 3.1 Pro Preview — Released February 19, 2026. $2 input / $12 output per million tokens. Still the Pro tier until Gemini 3.5 Pro lands in June 2026.
  • Qwen 3.5 — Released February 16, 2026. 397B MoE, 17B active, Apache 2.0 for smaller sizes. $0.10 input per million tokens — cheapest large-MoE API on the market. Source: Alibaba.

What to watch

Gemini 3.5 Pro is announced for June 2026. Expected pricing in line with 3.1 Pro ($2/$12) or slightly higher. The Flash and Pro tiers will likely both move to the 3.5 generation by Q3 2026.

The open-weight tier (DeepSeek V4, Qwen 3.5/3.6, Mistral Medium 3.5, Llama 5) has closed the gap with closed labs to single-digit benchmark points on most evaluations. Coding and reasoning leaders are now distributed across both camps.