Ask any of these models to "write the whole thing in one go" and you eventually hit a wall that has nothing to do with talent. It's the max output token limit: the hard cap on how much a model can emit in a single response. For a tweet it never matters. For a long report, a sample chapter, or a full draft you don't want to stitch together by hand, it's the first real constraint, and it splits these three before you've judged a single sentence.
| Model | Max output (tokens) | ≈ words, one pass | Context window |
|---|---|---|---|
| Claude Opus 4.8 | 128K | ~90,000 | 1M |
| GPT-5.5 | 128K | ~90,000 | 1.05M |
| Claude Sonnet 4.6 | 64K | ~45,000 | 1M |
So if your work is genuinely long-form in one shot, Opus 4.8 and GPT-5.5 are the heavyweights, each able to emit something close to a short book before they run out of room. Sonnet 4.6, Claude's balanced daily model, caps at half that. It's worth correcting a common mistake here: some sources list Sonnet 4.6 at 128K output, but Anthropic's own docs put it at 64K. If you want Claude to match GPT-5.5's ceiling, you want Opus 4.8.
Voice: the part no one will score for you
Here's the honest bit. Neither Anthropic nor OpenAI markets a model as the best creative writer, and there's no official benchmark that settles "better prose." So anyone who tells you one of these flatly writes better is selling you their taste as a fact. What the providers do say is narrower and more useful.
Claude's house style tends to be plainer and more human on the first try, which is why it's the writer's default in the Sonnet 4.6 review and why people reach for it on messages and emails. GPT-5.5 is tuned to be concise and is pitched at professional, document-heavy work, so it's strong on briefs, summaries, and clean structure. The practical takeaway: run the same prompt through both and keep the voice that sounds like you. That's a five-minute test that beats any third-party claim, this one included.
Instruction-following: where the brief lives or dies
This is the axis with an actual paper trail, and it favors Claude. Anthropic's stated improvement for Sonnet 4.6 is "superior instruction-following" and consistency, and it describes the model as less prone to overengineering. In writing terms, that means when your brief says "1,200 words, second person, no bullet points, skip the intro," Claude is more likely to hold all four constraints at once. GPT-5.5's instinct toward concise, reshaped output is great when you want it and a problem when you wanted exactly what you asked for.
For anything where the spec matters, a style guide, a word count, a structure you have to hit, Claude's discipline is the safer bet. The deeper version of this trade, across coding as well as prose, is in Opus 4.8 vs GPT-5.5, and the generalist case for GPT-5.5 is in the GPT-5 review.
Go with Claude when the brief is detailed and you need it followed to the letter, and use Opus 4.8 rather than Sonnet 4.6 when the piece is long in one pass. Go with ChatGPT's GPT-5.5 for concise, professional drafting and clean structure. On pure voice, skip everyone's claims, including ours: write the same prompt twice and keep the one that reads like you.