Llama 4, reviewed

A 10-million-token context on open weights still turns heads. But Meta has moved on, and Llama 4 is the last open Llama.

· View changelog · Figures verified against official sources, 30 May 2026

Scout context 10M 10,000,000 tokens, Meta-confirmed, single-H100 at Int4
Maverick context 1M 1,000,000 tokens, per the Hugging Face release blog
Weights license $0 Llama 4 Community License, not OSI open-source
Released 2025 April 5, 2025; an aging generation in 2026

Lead with the number that still does work: 10 million tokens. That's the context window on Llama 4 Scout, the smallest member of the herd, and it's an open-weight model you can download for nothing and run on a single H100 at Int4. Meta's own framing is that Scout takes the supported context from 128K in Llama 3 to "an industry leading 10 million tokens." More than a year after launch, no other downloadable model puts that figure on the table. If a giant context on hardware you control is the thing you need, Scout is still the answer.

Here's the turn. Llama 4 shipped on April 5, 2025. In AI-model years that's old, and Meta has quietly stopped treating it as the future. On April 8, 2026, Meta Superintelligence Labs announced a new closed-source model, Muse Spark, and in that announcement it refers to "Llama 4 Maverick" as "our previous model." Read that plainly: the company that built the most-downloaded open-weight family has moved its flagship behind a closed door. Llama 4 isn't being iterated at the frontier. It's the last major open-weight Llama, and you should buy into it knowing that.

This review covers the herd as it exists today, told through the three variants Meta named, and it's honest about what each one is for now that the generation is aging. Every number below is from Meta's official pages, and the benchmark figures are Meta's own: vendor-reported, not independently reproduced.

The herd, variant by variant

Llama 4 was Meta's first natively multimodal, Mixture-of-Experts (MoE) generation. Meta announced three sizes. Two shipped as open weights; one didn't ship at all. Knowing which is which is the whole buying decision, so start here.

The Llama 4 herd, as documented by Meta. Released April 5, 2025.
VariantStatusContextParametersBest use
Llama 4 Scout Released, open weights 10M tokens 17B active / 109B total, 16 experts Huge-context work on a single GPU
Llama 4 Maverick Released, open weights 1M tokens 17B active / 400B total, 128 experts Top open-tier reasoning and multimodal
Llama 4 Behemoth Announced, never released Not specified 288B active / ~2T total, 16 experts N/A; previewed as "still training"

Scout is the standout, and not only for the 10M number. At 17B active parameters out of 109B total it's light enough to serve from one H100 when quantized to Int4, which is what makes the huge context usable in practice rather than a spec-sheet trophy. If your workload is feeding enormous documents, codebases, or transcripts into a model you host yourself, this is the variant that earns the review. benchr's deep dive on how the big context windows compare puts that 10M claim next to what the closed frontier offers, and the gap on raw downloadable context is real.

Maverick is the heavier sibling: same 17B active parameters but 400B total across 128 experts, with a 1M-token context per the official Hugging Face release blog. It's the one Meta positioned as its strongest open chat-and-reasoning model, and it's the one Meta now calls its "previous model." Both Scout and Maverick are downloadable from llama.com and Hugging Face, and both were wired into Meta AI across WhatsApp, Messenger, Instagram Direct, and meta.ai.

Behemoth is the asterisk. At a reported 288B active parameters and nearly two trillion total, it was the teacher model the others were said to distill from — and it was only ever previewed as "still training." Meta never released it. No official cancellation has been published either; third-party outlets have reported a pause for performance reasons, but that isn't confirmed by Meta. Treat Behemoth as a name on a slide, not a model you can use.

The license clause that decides who can use it

Llama 4 is "open weight," not open-source, and the distinction is in the license. The Llama 4 Community License lets almost anyone download, run, fine-tune, and ship commercially, with attribution and a "Built with Llama" requirement plus an acceptable-use policy. For the vast majority of teams that's clean enough to deploy with confidence. There's one line that matters if you're large, though.

For everyone below that line, the license is permissive enough that the comparison with truly open competitors comes down to capability and support, not legal risk. benchr's survey of the open-weight tier right now places Llama 4 against the Apache- and MIT-licensed alternatives, and it's worth noting that some of those, like the models in the Mistral review, carry their own research-only or restricted terms. "Open" is a spectrum, and Llama 4 sits in the permissive-but-conditional middle of it.

The benchmarks, read as vendor figures

Meta published a full benchmark sheet at launch, and it's the source for every score people quote. Read all of these as Meta's own reported numbers: vendor-reported, not independently reproduced. On MMLU Pro, Meta lists Maverick at 80.5 and Scout at 74.3. On the multimodal MMMU, Maverick is 73.4 and Scout 69.4. Document and chart understanding looked strong, with both variants at 94.4 on DocVQA and Maverick at 90.0 on ChartQA. Coding was the soft spot Meta's own sheet shows: LiveCodeBench came in at 43.4 for Maverick and 32.8 for Scout, well behind dedicated coding models.

One figure needs a warning label. Meta's blog cited an LMArena Elo of 1417 for Maverick, but that was an experimental, chat-optimized variant, not the released weights, and it drew criticism precisely because the version that topped the leaderboard wasn't the one you could download. Don't carry the 1417 into a comparison with shipped models. For why a single headline score is a weak basis for any decision, benchr's piece on why the benchmarks stopped telling you anything is the relevant background.

What it costs to run

There's no Meta price to quote, because Meta doesn't sell first-party API access for Llama 4. It ships the weights and you run them, or you call a third-party host that prices separately. Meta's own blog does cite an estimated inference cost for Maverick of roughly $0.19 per million tokens (blended 3:1 input-to-output, with distributed inference). Read that exactly as written: it's a cost estimate Meta published, not a price Meta charges you. Your real bill is hardware and operations, or whatever a host like IBM watsonx.ai or a cloud provider lists.

That makes the practical question the same one every open-weight model raises: self-host or rent. Scout's whole appeal is that the huge context fits on one H100, which lowers the self-host bar dramatically compared to the 400B Maverick or anything Behemoth-sized. benchr's guide to running models on your own machine covers where that line sits, and for Scout specifically the answer is friendlier than usual: a single high-end GPU gets you a 10M-token model, which is an unusual capability-per-box ratio.

The verdict

Llama 4 earns a 4.0, and the score is doing two jobs. It credits a real, still-unmatched feature, Scout's 10M-token context on open weights you can run on one GPU, plus a license permissive enough for almost any company to deploy. It also marks down for age. This is an April 2025 generation that Meta has stopped advancing, now formally referred to as the "previous model" behind a closed-source successor. You're buying a capable, frozen artifact, not a model on an upward path.

Go with Llama 4 Scout when a giant context on hardware you control is the requirement and you can absorb that the generation won't get better. Pick Maverick when you want the strongest open-tier reasoning Meta shipped and 1M tokens is enough. Skip Behemoth entirely, because it doesn't exist to download. And if your roadmap needs frontier accuracy, a vendor actively iterating, or freedom from the 700M-MAU clause, look past Llama 4 — it was the high-water mark for open Llama, and the tide has gone out.

Frequently asked

Is Llama 4 free to use?

The weights are free to download. Llama 4 Scout and Llama 4 Maverick are published under the Llama 4 Community License on llama.com and Hugging Face, so you can self-host at no licensing fee. It is not OSI open-source, and there is a catch: if your products had more than 700 million monthly active users in the month before Llama 4's release, you must request a separate license from Meta, granted at its sole discretion. Meta does not sell first-party API access, so the running cost is your own hardware or a third-party host's per-token rate.

What is Llama 4 Scout's context window?

Scout supports a 10,000,000-token context window, confirmed by Meta on ai.meta.com and llama.com, where Meta describes it as raising the supported context length from 128K in Llama 3 to an industry-leading 10 million tokens. Llama 4 Maverick is listed at 1,000,000 tokens in the official Hugging Face release blog. Treat the 10M figure as the supported ceiling Meta documents, not a guarantee of perfect recall across the whole window.

What happened to Llama 4 Behemoth?

Behemoth (288B active, nearly two trillion total parameters, 16 experts) was previewed in April 2025 as still training and was never officially released. As of late May 2026, Meta has not published an official cancellation. Third-party outlets have reported it was paused or held back for performance reasons, but treat that as unverified by Meta. For planning purposes, Behemoth is not a model you can download or call.

Is Llama 4 still worth building on in 2026?

It depends on what you value. Llama 4 is now an aging generation: on April 8, 2026, Meta announced a closed-source model, Muse Spark, that names Llama 4 Maverick as its previous model, signaling Meta has moved past open-weight Llama as its flagship. If you need a permissive, self-hostable model with a very large context and a license clean enough for almost any company, Llama 4 still does that job. If you need frontier accuracy or a vendor on a clear update path, it is being left behind.

Are Llama 4's benchmark scores reliable?

They are Meta's own reported figures. Numbers like Maverick's 80.5 on MMLU Pro and Scout's 74.3 come from llama.com and should be read as vendor-reported, not independently reproduced. One number deserves extra caution: the 1417 LMArena Elo Meta cited was an experimental chat-tuned variant, not the released weights, and it drew criticism for not matching what shipped. Benchmark Llama 4 on your own tasks before trusting any single headline score.

Changelog

  • May 30, 2026 — Originally published. Variant lineup, context windows, parameter counts, license terms, and benchmark figures verified against Meta's official Llama 4 blog (ai.meta.com), llama.com model and license pages, and the meta-llama Hugging Face model cards and release blog. Benchmark scores are labeled Meta-reported; the 1417 LMArena Elo is flagged as an experimental chat variant. Successor status confirmed against Meta's April 8, 2026 Muse Spark announcement, which names Maverick as its "previous model." No official Behemoth cancellation was found.

References

  1. Meta, "The Llama 4 herd: the beginning of a new era of natively multimodal AI innovation," ai.meta.com/blog/llama-4-multimodal-intelligence, April 2025.
  2. Meta, "Llama 4 models," llama.com/models/llama-4, accessed May 2026.
  3. Meta, "Llama 4 Community License Agreement," llama.com/llama4/license, effective April 5, 2025.
  4. Meta, "Llama-4-Scout-17B-16E-Instruct model card," huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Instruct, accessed May 2026.
  5. Meta, "Llama-4-Maverick-17B-128E-Instruct model card," huggingface.co/meta-llama/Llama-4-Maverick-17B-128E-Instruct, accessed May 2026.
  6. Hugging Face, "Welcome Llama 4 Maverick & Scout on Hugging Face," huggingface.co/blog/llama4-release, April 2025.
  7. Meta, "Introducing Muse Spark," ai.meta.com/blog/introducing-muse-spark-msl, April 8, 2026.