No new model. Five orchestration features. And a roomful of engineers raising their hands when asked who had shipped a pull request that week written entirely by AI.
The headline event of the past week was Anthropic’s Code with Claude conference in London (19–21 May), the second stop after San Francisco on 6 May and the final one before Tokyo in June. No new model was announced — and that, as the engineers on stage made very clear, was the point. The story of 2026 is no longer about scaling parameters. It is about what happens when you let a capable model loop on itself, and what scaffolding you need to make that safe and productive. Here is what stood out from the Anthropic engineers themselves.
i. The headline shift: Claude prompts itself
Boris Cherny, who heads Claude Code, opened the keynote by reframing what working with Claude even means. The new default, in his framing, is not that you prompt Claude — it is that Claude prompts itself. Self-verification and self-correction are no longer optional features; they are the architecture. Instead of generating code and then having humans clean up errors, the model is expected to test and tweak in its own loop until things actually run.
For builders, this is more than a UX change. It rewires how you think about your role: less “drive the model turn by turn,” more “set a goal and review the trajectory.” Cherny has been consistent on this — on X earlier this year he noted that in the previous thirty days, every one of his contributions to the Claude Code codebase had been authored by Claude Code itself. His longer-standing advice to founders, repeated again this week on the Lightcone podcast: build for the model six months from now, not the one in front of you.
ii. “Let it cook” — Ravi Trivedi and Dreaming
Ravi Trivedi, another Anthropic engineer, distilled the principle down to two words.
Let it cook. — Ravi Trivedi, Anthropic
Get out of Claude’s way. The feature he demoed to back this up is called Dreaming — a new capability in Claude Code where agents write notes to themselves between tasks, saving observations, gotchas, and useful patterns from one run that inform the next.
For anyone running long-horizon agents — nightly exception investigations, automated CI fixes, multi-repo refactors — Dreaming changes the math. The agent isn’t starting cold every session anymore. It accumulates lived experience inside your codebase.
iii. How much code is now written by Claude?
Jeremy Hadfield, an Anthropic engineer speaking in London, opened one talk by asking how many people in the room had shipped a pull request that week written entirely by Claude. Roughly half the audience — laptops open on their knees — raised their hands. His follow-up, per MIT Technology Review’s coverage: the majority of Anthropic’s own software is now Claude-authored.
This is the kind of claim that’s easy to dismiss as marketing until you remember that Anthropic ships a non-trivial amount of production code on Claude Code, and that the people saying it are the ones writing — or no longer writing — the PRs.
iv. The end state, according to Anthropic
Asked where all this is heading, an Anthropic engineer named Jiang framed the goal plainly: the end state Anthropic is trying to reach is one where Claude can build itself. Catherine Wu, also on stage, added the necessary caveat that expert engineers are still needed to design systems and troubleshoot harder problems — but over time, Claude is expected to get better at all of it.
If that sounds aggressive, recall this is the same company whose Chief Product Officer Ami Vora opened the San Francisco event two weeks earlier, and whose engineers have been broadcasting the “six months from now” framing for years.
v. The five features that defined San Francisco — and resurfaced in London
Code with Claude SF on 6 May deliberately skipped a model launch and shipped five orchestration features instead. They came up repeatedly in the London sessions:
- Dreaming. Agents take persistent notes between runs (covered above).
- Outcomes. Developers define a rubric for a good output; a separate grader evaluates each result in its own context window and sends the agent back to revise until it meets the bar. Anthropic’s internal benchmarks show task success lifted by up to 10 points on the hardest problems.
- Multi-agent orchestration. Proper coordination between agents — the human’s job is to define the goal and review the output, not write the code.
- Webhooks. Once you’ve defined an outcome, let the agent run; get notified when it’s done.
- Claude Finance. Ten pre-built agents for financial workflows, plus Add-ins for Excel and similar surfaces.
For MCP-heavy workflows, Outcomes is arguably the most underrated of the five. A grader running in its own context window is exactly the pattern that closes the loop on agents that “almost” finish a task.
vi. Developer Platform updates worth knowing
A handful of platform changes landed alongside the conference and matter especially if you’re building MCP servers or running Claude Code at scale.
- MCP tunnels in Research Preview — useful for local MCP servers exposed to remote Claude sessions without redeploying.
- Self-hosted sandboxes for Claude Managed Agents — finally addresses the data-residency and compliance questions enterprise customers kept asking.
- Live updates to MCP server and tool settings during active sessions — no more restart-the-session friction when iterating on a tool definition.
- Large tool outputs spill to a sandbox file — sensible default for anyone who’s blown a context window on a fat SQL result.
- Cache diagnostics in public beta — pass
diagnostics.previous_message_idon a Messages request and the API tells you exactly where your prompt cache prefix diverged from the previous turn. Beta header:cache-diagnosis-2026-04-07.
vii. Other Anthropic moves this week
A quick sweep of the announcements feed for context, even where engineers were not the ones speaking.
| Date | Announcement |
|---|---|
| 19 May | Claude stays ad-free — policy piece arguing advertising incentives are incompatible with a genuinely helpful assistant. |
| 19 May | KPMG integration — Claude rolled out across KPMG’s 276,000-person workforce. |
| 18 May | Stainless acquisition — the SDK-generation company joins Anthropic. |
| 14 May | PwC deploying Claude — for client deals and enterprise function rebuilds. |
| 14 May | Gates Foundation partnership — $200M, focused on global health and development applications. |
| 13 May | Claude for Small Business — connectors and ready-to-run workflows for QuickBooks, PayPal, HubSpot, Canva, Docusign, Google Workspace, and Microsoft 365. |
viii. What this means if you build with MCP and agents
The center of gravity has shifted to orchestration. If your MCP server is still designed around “Claude calls a tool, gets a result, replies to the user,” you’re building for the 2025 pattern. The 2026 pattern is “Claude calls your tool inside a self-graded loop, possibly across multiple agents, possibly with notes from previous runs informing this one.” Tools that return structured, gradable outputs — not just text — get more leverage.
Self-hosted sandboxes make regulated environments viable. Telematics, finance, healthcare: data-residency was the blocker, and now there’s a native answer. Worth a serious look for .NET/Azure workloads where the data can’t leave the customer’s tenancy.
Outcomes is the closest off-the-shelf equivalent to what teams have been hand-rolling with BMAD, Taskmaster AI, and similar methods. If you’ve been building grader-and-executor loops by hand, the native version is going to be cheaper to maintain.
And finally: if Anthropic’s own engineers are routinely shipping PRs that Claude wrote end-to-end, the question for the rest of us isn’t whether to operate this way. It is how soon you can restructure your workflow to match — and what guardrails you need so that let it cook doesn’t become let it burn.
ix. For the .NET reader
A few of the announcements above land harder if your day job is C# and Azure.
Self-hosted sandboxes are the unlock for everything regulated-but-interesting that .NET shops typically own — pension calculations, claims processing, public-sector workflows. The pattern most teams have been hand-rolling (Claude Managed Agents calling out to a private API gateway in front of Azure resources) gets a supported equivalent. Worth checking whether your compliance team can finally sign off on Managed Agents instead of “ChatGPT Enterprise but please don’t paste customer data.”
MCP servers in C#. With live tool-setting updates and tunnels in research preview, the iteration loop on a homegrown MCP server gets dramatically tighter. If you’ve been putting off building one because the dev loop felt slow, that excuse is shrinking. The MCP C# SDK plus dotnet user-secrets and a tunnel covers the local-to-Claude story without redeploying.
Outcomes maps cleanly onto xUnit thinking. A rubric is a test, a grader is a runner, the agent revises until green. If you can write an [Theory] with [InlineData] cases, you can write an Outcomes rubric — which means the leap from “AI helps me code” to “AI ships code I review against a spec” is smaller than the marketing makes it sound.
The center of gravity for AI-driven .NET work in the back half of 2026 will be less “which prompt do I use” and more “what’s the rubric, what’s the sandbox, what’s the loop.” Worth getting comfortable with that vocabulary now.
Sources
- MIT Technology Review — “Anthropic’s Code with Claude showed off coding’s future” (21 May 2026)
- Anthropic announcements — anthropic.com/news
- Releasebot — Anthropic update feed (16–22 May 2026)
- Code with Claude SF coverage by Simon Willison and MindStudio (6 May 2026)
- X posts from Boris Cherny — @bcherny