Review·May 2026

Grok 4.3, reviewed

Name: Grok 4.3, reviewed
Item: Grok 4.3
Rating: 4.2
Author: benchr

The one model that reads X and the live web on its own. Where that wins outright, and where it doesn't.

Updated May 30, 2026 · View changelog · Figures verified against official sources, 30 May 2026

Type a question about a story that broke an hour ago into a stock chatbot and you get a polite shrug about a training cutoff. Ask Grok 4.3 and it goes and looks. xAI built native, server-side Web Search and X Search tools into the model itself, so it can pull live posts, breaking news, and current web pages on its own, mid-answer, without you standing up a retrieval pipeline first. That single capability is the spine of this review, because it's the thing Grok 4.3 does that the field doesn't.

Everything else about the model is solid and unremarkable in the way frontier models have become: a 1,000,000-token context window, a low price, and a respectable spot on the public leaderboards. The interesting question isn't whether Grok 4.3 is smart. It's whether live access to X and the open web is worth choosing it over a model that scores higher on the benchmarks you'd normally shop by.

Input / output / 1M $1.25 $1.25 in, $2.50 out, $0.20 cached input

Context window 1,000,000 tokens, per xAI's official model card

Intelligence Index 53 Artificial Analysis, third-party, not xAI

τ²-Bench Telecom 98% Agentic tool use, Artificial Analysis

A note on sourcing before the numbers do any work. xAI did not publish an official benchmark suite or a dedicated launch post for Grok 4.3, and there's no exact launch day stated on its own pages, which file release notes by month. The model is in xAI's official catalog with its own card under the ID grok-4.3, and the pricing and context figures here come straight from that card. Every benchmark figure, by contrast, is from the third-party Artificial Analysis leaderboard, not from xAI. Treat them as outside estimates, not vendor claims.

The live-data edge is the product

The reason to care about Grok 4.3 is that it closes the gap between a model's training cutoff and the present. Most assistants are frozen at whatever date their data stopped, and the way around that is to bolt on your own search-and-retrieve layer. Grok 4.3 skips the bolt-on. Its Web Search and X Search are first-class tools the model calls server-side, which means it can decide on its own that a question needs fresh information and go get it.

The X Search piece is the part no competitor can copy easily, because it's tied to the platform. When a question is about what people are saying, not just what's been published, Grok 4.3 can sample the firehose of posts directly. That's social signal as a native input: sentiment on a product launch, the shape of a developing story, the reaction to an announcement while it's still unfolding. A model reading the open web a few hours later gets the article. Grok gets the room.

This is the same separation benchr drew in its argument that saturated benchmarks stopped telling you anything useful: when frontier models cluster within a few points, the differences that matter move off the leaderboard and into what the model can reach. Live data is exactly that kind of difference. It doesn't show up in an Intelligence Index score, and it's the main thing you're buying here.

Where it wins, where it doesn't

The honest read is that Grok 4.3's advantage is narrow and deep rather than broad. It's built around a capability, not around topping charts. Here's the split, and it's worth being blunt about both columns before you commit.

Grok 4.3: where the live-data model is the right call, and where it isn't
Use case	Verdict
Questions about breaking news or events from today	Wins: native Web Search answers without a retrieval layer
Reading social sentiment and reaction on X	Wins: native X Search is the capability rivals can't match
Agentic tool calling and instruction following	Strong: top third-party scores on τ²-Bench and IFBench
Cost-sensitive, high-volume API work	Strong: $1.25/$2.50 with $0.20 cached input is aggressive
Hardest reasoning and frontier coding tasks	Not the leader: Index of 53 trails the chart-toppers
Workloads that need verified video input	Skip for now: official card lists text and image input only

On the wins side, the agentic numbers back up the live-data pitch. Artificial Analysis puts Grok 4.3 at 98% on τ²-Bench Telecom, a test of instruction following and tool use, and 81% on IFBench. Its GDPval-AA agentic score is reported at 1500 ELO, up 321 points from the prior Grok 4.20 release. A model whose job is to call tools and follow instructions cleanly is exactly the model you want driving live search, so those scores and the differentiator reinforce each other.

On the other side, the Artificial Analysis Intelligence Index of 53 is the number to keep you honest. It's a frontier-class score, but it's not a leading one, and Grok 4.3 isn't where you go for the hardest reasoning or the top coding result. If that's the work, Claude Opus 4.8's coding lead or whichever model currently tops your benchmark of choice is the better pick. The coding assistants shootout ranks the field on that axis, and Grok 4.3 isn't sitting at the top of it.

What it costs, and why the price is the second argument

Pricing is the other reason this model is easy to reach for. The API runs $1.25 per million input tokens and $2.50 per million output, with cached input at $0.20 per million, per xAI's official card. That's aggressive for a frontier model with a million-token window, and the pricing is flat: there's no higher tier that kicks in above a token threshold, so a long-context job costs what the rate says it costs.

$0.20 Per million cached input tokens, the rate that makes repeated long-context calls cheap to run.

The cached-input rate is the line to watch if you're feeding the same large context across many calls, which is common in agent loops and document work. At $0.20 per million it cuts the dominant cost of long-context prompting hard. For the broader question of which model is cheapest for a given job, benchr's price-per-use-case breakdown compares rates across the field; Grok 4.3 lands in the cheap-and-capable band, which is a comfortable place to be.

One caveat on access. The API price above is clean and public. Consumer access through grok.com and the Grok apps is gated behind the SuperGrok and X Premium+ subscriptions, and xAI doesn't publish the exact dollar prices of those tiers on a single official model page. If you're a chat user rather than a builder, budget for a subscription whose number you'll have to confirm at sign-up.

What the official card doesn't claim

A few popular claims about Grok 4.3 don't hold up against xAI's own documentation, and they're worth flagging so you don't build on sand. The official model card lists input modalities as text and image, producing text output. It does not list native video input. Several third-party write-ups say Grok 4.3 takes video; that's unconfirmed on xAI's own page, so don't design a video pipeline around it until the company states it directly. If multimodal breadth is the deciding factor, the multimodal capability ranking is the place to compare what's verified across models.

The knowledge cutoff is also ambiguous. xAI's own surfaces have shown one date while a third-party report claimed a later one, and the two don't agree, so treat the cutoff as unconfirmed. In practice it matters less here than for any other model, because the native Web Search and X Search are the answer to a stale cutoff. That's the neat thing about this model: the one spec that's fuzzy is the one its core feature is designed to paper over.

The verdict

Grok 4.3 is a buy for a specific shape of work and a pass for another, and the line between them is clean. Go with it when freshness or social signal is the point: monitoring a developing story, reading reaction on X, answering questions whose right answer changed today, or building an agent that needs live data without a retrieval stack underneath it. At $1.25 in and $2.50 out with a million-token window, it's cheap enough that the live-search capability comes without a price penalty.

Skip it when the job is the hardest reasoning or the top coding score, where its third-party Intelligence Index of 53 puts it in the conversation but not at the front. For that, a dedicated leader is the safer call, and a head-to-head like the field's read on GPT-5.5 or Gemini 3.1 Pro will steer you better than this page. But for anyone whose problem is "the model can't see the present," Grok 4.3 is the cleanest fix on the market, and nothing else does it natively. That's worth choosing for, even with a benchmark or two left on the table.

Frequently asked

What makes Grok 4.3 different from other frontier models?

It ships with native, server-side Web Search and X Search tools, so the model can pull live posts, news, and web pages on its own without you wiring up your own retrieval layer. That live-data access is the one thing it does that GPT-5.5, Gemini, and Claude don't do out of the box, and it's the reason to reach for Grok 4.3 over a rival.

What does Grok 4.3 cost?

API pricing is $1.25 per million input tokens, $2.50 per million output tokens, and $0.20 per million cached input tokens, per xAI's official model card. The pricing is flat, with no higher tier above a token threshold. Consumer access through grok.com and the Grok apps sits behind the SuperGrok and X Premium+ subscriptions, whose exact dollar prices xAI doesn't publish on a single official model page.

Is Grok 4.3 a good coding or reasoning model?

It's competent, not the leader. On the third-party Artificial Analysis Intelligence Index it scores 53, which puts it in the frontier conversation but behind the models that top the coding and reasoning charts. Its strongest third-party numbers are in agentic tool use and instruction following, not raw reasoning. If pure coding or hard reasoning is the job, a dedicated leader is the safer pick.

How big is the Grok 4.3 context window?

The official model card lists a 1,000,000-token context window. That's large enough for whole-repository or long-document work, though context size alone doesn't tell you how reliably a model uses the back half of that window.

Does Grok 4.3 support video input?

Not according to the official model card, which lists text and image input producing text output. Several third-party write-ups claim native video input, but that is not confirmed on xAI's own documentation, so don't build around it until xAI states it directly.

Changelog

May 30, 2026 — Originally published. Pricing, context window, modalities, and availability verified against xAI's official model card and developer docs (docs.x.ai); all benchmark figures are sourced to the third-party Artificial Analysis leaderboard, as xAI published no official benchmark suite for this release.

References

xAI, "grok-4.3 model card," docs.x.ai/developers/models/grok-4.3, accessed May 2026.
xAI, "Models catalog," docs.x.ai/developers/models, accessed May 2026.
xAI, "Release notes," docs.x.ai/developers/release-notes, accessed May 2026.
xAI, "API," x.ai/api, accessed May 2026.
Artificial Analysis, "xAI launches Grok 4.3 with improved agentic performance and lower pricing," artificialanalysis.ai, accessed May 2026.