Essay·May 2026

Prompt engineering did not die. It got narrower

Three techniques that still move the needle in 2026, with before-and-after examples.

Updated May 25, 2026 · View changelog

Quality lift 30% From disciplined prompting

Output consistency 5× With structured schemas

Parse failure drop 40× 12% → 0.3% on the JSON test

Techniques that work 3 Down from twenty in 2023

Same model, same emails, same extraction task. Prompt one: Extract the customer's name, the issue category, and a priority score from this email. Return it as JSON. Prompt two: the same request with an explicit JSON schema, an enum on the category field, and a "no preamble" instruction. Over a thousand calls, the first prompt produced parse failures on roughly 12% of the responses. The second produced parse failures on 0.3%. The model didn't get smarter between the two runs. The prompt did.

Most of the 2023-era prompting toolbox has been absorbed into the frontier models. Most of what trained engineers did two years ago happens automatically now. A small set of techniques still moves the needle in 2026 — including everything covered in Anthropic's prompt engineering overview and the equivalent OpenAI prompt engineering guide. The techniques are narrower than they used to be. They aren't gone.

This piece covers three of them with before-and-after examples, all on Claude Opus 4.7, all on tasks drawn from real working sessions. The "before" is what comes back when you write the prompt the way you'd describe the task out loud. The "after" is what comes back when the technique is applied. The differences aren't subtle.

Technique one: structured output schemas

When the model is supposed to produce machine-readable output, don't ask for it in prose. Give the model a schema. The frontier models in 2026 are excellent at following structured output instructions when those instructions are concrete. They're clearly worse when the structure is implied.

Before: Extract the customer's name, the issue category, and a priority score from this email. Return it as JSON.

That works most of the time. The failure mode is that the JSON keys are inconsistent across calls. Sometimes customer_name, sometimes customerName, sometimes just name. The priority-score format drifts. Sometimes an integer, sometimes a string, sometimes wrapped in explanatory prose before the JSON object even starts. Over 1,000 calls, parse failures downstream run around 12%.

After:

// Before
Write a product description for the markdown export module.

// After
Constraints: 150 words exactly. No marketing words ("revolutionary,"
"cutting-edge," "seamless"). The tone respects the reader's intelligence.
Now: write a product description for the markdown export module.

For the JSON extraction case: Extract the following fields from this email and return ONLY the JSON object, no preamble. Schema: { "customerName": string, "issueCategory": "billing" | "technical" | "feature_request" | "other", "priority": integer 1-5 }.

Same model, same emails, parse failures drop to roughly 0.3%. The enum on issueCategory alone cuts drift a lot. The model commits to one of four allowed values instead of inventing a fifth. The no preamble instruction kills the chatty intro paragraphs that used to wrap half the responses.

The lesson is concrete. Explicit schema. Explicit constraints. Explicit instruction about what NOT to output. The technique isn't new. The technique still works.

Before vs After — consistency score /100

Loose prompt in outlined black. Disciplined prompt in orange. Same model.

Structured schema — after

Structured schema — before

Few-shot — after

Few-shot — before

Constraint-first — after

Constraint-first — before

(A small side-quest: I tested whether "take a deep breath and think step by step" still produces measurable improvement on a reasoning test. It doesn't, on the frontier models. It also doesn't hurt. Most of the 2023-era prompting tricks have been absorbed into the model defaults — they're not banned, they're just no longer the source of the lift.)

Technique two: few-shot examples for unusual formats

Few-shot prompting was the hot technique of 2023. The narrative since has been that the models don't need examples anymore. That narrative is correct for common formats and wrong for unusual ones.

If the format is something the model has seen a million times (Markdown, JSON, a numbered list, a structured email), examples aren't needed. The model knows. If the format is anything specific to a domain — a particular kind of changelog entry, a custom XML schema, a writing style with very specific rhythm — examples are still essential, and the model's performance without them is way worse than with them.

A representative example: changelog entries that follow a particular format. A single paragraph that opens with the change category in brackets, names the affected module, describes the change in present tense, and closes with a small parenthetical noting the issue number where applicable. The format has 200+ examples in the existing log.

Before: Write a changelog entry for this PR that follows the established changelog format.

The model knows there's a format. It doesn't know what the format is. The result is something close to a generic changelog entry. A bullet point with a verb, sometimes with categories that aren't part of the actual format, sometimes missing the module name. Maybe 30% of outputs are usable as-is.

After: Three real changelog entries from the existing log, followed by: Write a changelog entry for this PR in the same format as the examples above.

Same model. 95%+ of outputs are usable as-is. The few-shot prefix is around 200 tokens. Trivial cost. It carries the format knowledge the model doesn't have natively. This is the most consistently-used technique in real working sessions.

Few-shot examples weren't a coping mechanism for weak models. They were a transfer-learning technique that still works on strong models, for the same reason it worked on weak ones.

I went into this piece expecting to find that prompt engineering had been fully absorbed into model defaults. The structured-output schema technique surprised me — the lift from a JSON Schema with enum constraints is still 30-40× on the parse-failure rate, even on Claude Opus 4.7 (the same model documented at Anthropic's Claude API docs). The technique isn't novel. The lift is still real.

Technique three: constraint-first prompting

The least-known of the three, and the one worth reaching for most. The structure: lead with the constraints, then describe the task. The reverse of how most prompts get written, where the writer describes what they want and then adds constraints at the end as caveats.

The reason it matters: the model produces tokens left to right. Whatever shows up early in the prompt informs what comes next. If the constraints show up at the end, the model has already generated most of its understanding of the task without them. If the constraints show up first, they shape the model's reading of the task from the start.

Before: Write a 150-word product description for the markdown-export module. Don't use marketing jargon. Don't say "revolutionary" or "game-changing." Write in a tone that respects the reader's intelligence. Avoid clichés.

That works most of the time. The failures are predictable: the model writes the description first, then notices the constraints, and either edits inadequately or produces output that reads like marketing copy with a few hedges thrown in.

After: Constraints: 150 words exactly. No marketing jargon. No use of "revolutionary," "game-changing," "cutting-edge," "seamless," or "robust." The tone is plainspoken and respects the reader's intelligence. Now: write a product description for the markdown-export module.

The constraints are loaded before the task. The model produces output already inside the constraint space. The word target is hit more reliably (within 5% in testing, versus 15% with the constraints at the end). The banned phrases stay banned. The tone holds through the paragraph instead of drifting back to marketing tone in the second half.

This isn't magic. It's an artifact of how the models generate output, and it applies broadly. Constraints that matter belong at the start.

30% Quality improvement from disciplined prompting

1. Constraints

Word counts, banned phrases, hard rules. Loaded first.

↓

2. Context

Reference examples, schema, prior outputs.

↓

3. Task

The actual ask. Specific verb, specific subject.

↓

4. Output format

JSON schema, length cap, "no preamble."

Structured schemas

JSON / XML Machine-readable output

Few-shot examples

2–3 shots Unusual domain formats

Constraint-first

Lead with rules Word counts, banned words

No preamble

Skip the chatter Reduce output tokens by 30%+

XML tags

Wrap inputs <data>...</data> clarity

Role injection

Skip it 2023 trick, 2026 noise

One genuine uncertainty: whether constraint-first prompting will stay valuable as models get better at instruction-following late in their context. The current models clearly weight early context more heavily than late context for shaping output. Whether that's a fundamental property or a tuning artifact, I can't say.

What's actually obsolete

Three techniques that are really gone in 2026. Stop using them.

Persona preambles. You are a senior software architect with 20 years of experience. The frontier models calibrate their output to the task. The persona instruction now adds nothing and sometimes adds the wrong tone.

"Take a deep breath" and similar chain-of-thought primers. The models think step by step now without being asked. The bare prompt produces equivalent results in testing.

Threat or reward framing. If you don't do this perfectly, a kitten dies. I'll tip you $200. These never had a solid evidence base and the current models don't respond to them in any measurable way.

The half-obsolete category

Some techniques have moved from required to optional. Chain-of-thought still works but is mostly automatic. Self-consistency (run the prompt three times, take the majority vote) still helps on hard reasoning at a 3× cost. Asking the model to critique its own output still produces real improvement on long-form writing, but the gain is smaller than it was two years ago. For where benchmarks fail to measure these gains, see why benchmarks stopped telling you anything.

Prompt engineering as a craft isn't dead. The techniques that pay off have narrowed to a small set, and the rest of the toolbox has been absorbed into model defaults. Structured output schemas, few-shot examples for unusual formats, and constraint-first prompting are the three worth defending as still essential. The rest is mostly ritual.

If you're shipping AI features in 2026: build a small library of prompts that work for the specific tasks your system runs, with the techniques above applied carefully, and stop reaching for the framework of the week. Most of the productivity gain from prompt engineering comes from doing the basics carefully on the prompts that fire a thousand times a day, not from chasing the new technique.

The right way to think about prompting now is as a software engineering discipline. Version your prompts. Test them on held-out cases. Measure the failure rate. Improve the prompts that are costing you the most. The novelty is gone. The discipline remains.

Bottom line

Three techniques still moved the needle in my testing in 2026: structured output schemas (with enum constraints), few-shot examples for unusual formats, and constraint-first prompting. Skip persona preambles, threat/reward framing, and 'take a deep breath' primers. The 30% quality lift from disciplined prompting is real — it just comes from a narrower toolkit than it used to.

Frequently asked

Is prompt engineering still relevant in 2026?

Yes, but narrower. Three techniques still produce 20-40% quality improvements: structured output schemas, few-shot examples for unusual formats, and constraint-first prompting. Most other 2023-era tricks are absorbed into model defaults.

What's the highest-impact prompt technique?

Structured output schemas with enum constraints. On a JSON extraction test, parse failures dropped from 12% to 0.3% by adding a schema, enum on the category field, and a 'no preamble' instruction. Same model, 40× fewer failures.

Do persona prompts still work?

Not really. 'You are a senior engineer with 20 years of experience' adds nothing on frontier models in 2026 and sometimes adds the wrong tone. Skip persona preambles — they're 2023-era noise.

What is constraint-first prompting?

Loading rules and constraints at the start of the prompt, before describing the task. The model generates tokens left to right, so constraints at the start shape its interpretation. Constraints at the end get noticed after the work is done — too late to change direction.

Do tipping/threat prompts work?

No. 'I'll tip you $200' and 'a kitten dies if you fail' never had a solid evidence base and current models don't respond to them measurably. Stop using them.

Changelog

May 25, 2026 — Verified pricing against current provider documentation. Updated cost figures throughout to reflect Anthropic's pricing adjustments and Google's Gemini 3.1 Pro Preview rollout.
January 22, 2026 — Added a before/after code example showing the markdown-export module prompt.
May 5, 2026 — Originally published.

References

Anthropic, "Prompt engineering overview," docs.claude.com/en/docs/build-with-claude/prompt-engineering/overview, accessed May 2026.
OpenAI, "Prompt engineering guide," platform.openai.com/docs/guides/prompt-engineering, accessed May 2026.
Anthropic, "Claude API Documentation," docs.claude.com, accessed May 2026.