Your harness has a shelf life: one week of releases, re-tuned

Last week’s post ended with a calendar reminder: review your CLAUDE.md and hooks against the current model every three to six months, because instructions tuned for today’s models can quietly constrain newer ones.

It took eight days.

Since that post went live on June 2, Anthropic shipped a new flagship model family, a troubleshooting mode that disables your entire harness in one flag, a real feedback channel for Stop hooks, and a handful of guardrails that used to be your job and are now built in. None of it invalidates the layered approach — if anything, this week is the strongest argument for it. A lean, well-factored harness absorbs a model upgrade in an afternoon. A bloated one is where you find out which of your 400 lines of instructions were actually load-bearing.

Here’s what changed, mapped onto the same layers as last time, and what we did about it in our .NET codebase.

The headline: Claude Fable 5

On June 9 (v2.1.170), Anthropic released Claude Fable 5, the first model in a new Mythos-class tier that sits above Opus. Anthropic’s own framing is that its capabilities exceed those of any model they’ve ever made generally available; a sibling, Claude Mythos 5, ships without certain dual-use safety measures to approved organizations only. So twelve days after Opus 4.8 became the default, the model picker has a new top shelf.

Two practical consequences for a tuned harness.

First, re-read your CLAUDE.md with fresh eyes. The rules we wrote for Opus-class models fall into two buckets: facts about our repository (build in Release, never touch the solution file, integration tests are excluded from solution-wide runs) and compensations for model behavior (be explicit about X, always do Y before Z). The first bucket survives any model change untouched. The second bucket is exactly what the original post warned about — instructions that constrain a newer model into yesterday’s failure modes. We cut three lines this week that existed purely because an older model needed the reminder. Fable doesn’t.

Second, a small but telling change landed in v2.1.169: the “CLAUDE.md is too long” warning threshold now scales with the model’s context window. That’s the platform formalizing the thesis from last time — context budget is relative, and the tooling now treats it that way. Don’t read it as permission to bloat; read it as confirmation that the budget is a first-class concern.

Model and effort remain per-session knobs (--model, /effort), and as of v2.1.162 the /effort picker confirms when your chosen level persists as the default for new sessions. We run Fable for design-heavy and cross-cutting work and keep cheaper models on mechanical tasks — the harness doesn’t care which model is driving, which is rather the point.

Layer 2 revisited — hooks grew a feedback channel

The most useful change of the week for anyone running guardrails is easy to miss in the changelog. As of v2.1.166, Stop and SubagentStop hooks can return hookSpecificOutput.additionalContext — feedback that flows back to Claude and keeps the turn going, without being labeled a hook error.

Why that matters: the “don’t end the turn until the tests pass” pattern from last time worked, but it worked by force. The hook blocked the stop, Claude saw an error, and you hoped it inferred the right next step. Now the hook can say why:

{
  "hookSpecificOutput": {
    "hookEventName": "Stop",
    "additionalContext": "3 tests failing in OrderProcessingTests. Fix before finishing: dotnet test --filter Category!=Integration"
  }
}

The turn continues with that context injected. It’s the difference between a bouncer and a coach.

How we did it. Our Stop hook used to exit non-zero with a terse stderr message when the build or tests failed. We rewrote it to return additionalContext carrying the actual failure summary — failing test names, the exact re-run command. Anecdotally, recovery is faster and less flaily: Claude goes straight at the named failures instead of re-running the whole suite to rediscover them.

Two more guardrail upgrades worth folding in:

Glob patterns in deny rules (v2.1.166): the tool-name position now accepts globs, so "*" denies all tools in a rule. Unknown tool names in deny rules also warn at startup now — which immediately flagged a typo’d rule of ours that had been silently matching nothing for weeks. Run your eyes over that startup output once.
Built-in protection for execution-granting config files (v2.1.160–163): Claude Code now prompts before writing to shell startup files, and acceptEdits mode prompts before writing build-tool configs that grant code execution — .npmrc, .pre-commit-config.yaml, .devcontainer/, and friends. We had hand-rolled hooks covering some of these paths. We kept our solution-file and Directory.Packages.props guards (those are ours alone) and deleted the overlap. Fewer hooks, same coverage — that’s a win.

Layers 1 and 3 — tools for context hygiene

Two additions in v2.1.169 are small, but they’re exactly the kind of feature a layered setup wants.

--safe-mode (or CLAUDE_CODE_SAFE_MODE=1) starts Claude Code with every customization disabled — CLAUDE.md, plugins, skills, hooks, MCP servers, all of it. This is the harness debugger we didn’t have. When behavior is weird, the first question is always “is it the model or is it my config?” — and until now answering it meant manually unwiring things. Now it’s one flag: reproduce in safe mode, and you know which side of the line the problem lives on. We’ve already used it once to prove that a misbehaving edit pattern came from an over-broad rule in a nested CLAUDE.md, not from the model.

disableBundledSkills hides Anthropic’s bundled skills, workflows, and built-in slash commands from the model. If you’ve built your own opinionated skills for PRs, reviews, and intake — as we have — the bundled equivalents are pure context tax and an occasional source of “wait, why did it use that one.” We flipped it on for the main repo. Relatedly, /plugin list (v2.1.163) finally gives you a quick inventory of what’s installed, with --enabled/--disabled filters — useful for the periodic pruning ritual.

One tiny skills note: there’s now a \$ escape (v2.1.166) for a literal dollar sign before a digit in command bodies. If you write skills that touch regex replacement strings or MSBuild syntax, you no longer have to contort around the 0-based $0 argument substitution.

Layer 5 revisited — parallel work, hardened

Three changes here, in ascending order of importance.

/cd (v2.1.169) moves a live session to a new working directory without breaking the prompt cache. Before, hopping a session between a worktree and the main checkout meant either restarting or eating a cold cache. Minor quality-of-life, real money saved on long sessions.

fallbackModel (v2.1.166) accepts up to three fallback models tried in order when the primary is overloaded, and --fallback-model now applies to interactive sessions too. For overnight Routines and long-running background agents this turns “the run died at 2 AM because the flagship was busy” into “the run degraded gracefully to Sonnet and finished.” We set a two-deep fallback on every scheduled job.

Cross-session messaging got hardened (v2.1.166): messages relayed via SendMessage from other Claude sessions no longer carry user authority — receivers refuse relayed permission requests, and auto mode blocks them. If you run agent teams, this closes a real hole: one session can no longer launder a permission grant through another. Nothing to configure; just know that if you had a workflow that relied on relayed approvals, it will now (correctly) stop working.

Beyond the CLI: Dreaming, and more headroom

Two announcements from Code w/ Claude Tokyo (June 5–6) round out the week.

Rate limits on Claude Code doubled, alongside raised API limits for Opus. No configuration change needed — but if you previously throttled parallel worktree sessions or staggered Routines to stay under limits, revisit that. We un-staggered two nightly jobs.

Claude Managed Agents got “Dreaming”: a scheduled process that reviews past agent sessions, surfaces patterns, and curates memory so agents improve between runs — recurring mistakes, shared workflows, and team preferences get pulled into a durable store. It’s the platform-hosted cousin of the auto-memory “scar tissue” we lean on locally, and it’s the clearest signal yet of where this is going: memory curation as an explicit, scheduled discipline rather than an accident of usage. We’re not on Managed Agents for this codebase, but if you run fleets, this is the feature to evaluate first.

The updated checklist

If you followed last week’s checklist, here’s the delta — an hour of work, tops:

Re-read your CLAUDE.md against Fable 5. Delete every line that compensates for old model behavior rather than documenting your repo. (Ours lost three.)
Rewrite your Stop/SubagentStop hooks to return additionalContext with the actual failure details instead of a bare non-zero exit.
Diff your file-protection hooks against the new built-in prompts for shell startup and build-tool config files; delete the overlap.
Check startup output for the new unknown-tool-name warnings on deny rules. You may find a typo that’s been silently doing nothing.
Set disableBundledSkills if your own skills cover the same ground, and run /plugin list while you’re at it.
Add fallbackModel (up to three, in order) to every scheduled or long-running job.
Learn the muscle memory: weird behavior → claude --safe-mode → bisect from there.

Last week’s closing line was that a tuned setup is like onboarding a sharp new colleague. This week’s amendment: the colleague got promoted, the building got new locks, and half your sticky notes are obsolete. The layers held. The contents needed a pass. That’s the deal you sign up for — and a week like this one is why keeping each layer small and single-purpose pays for itself.

Sources

Official documentation — Changelog (v2.1.154–v2.1.170), What’s new, Hooks reference, Settings.

Anthropic announcements — Claude Fable 5 & Mythos 5 (June 9, 2026), Code w/ Claude Tokyo announcements (June 5–6, 2026), Opus 4.8 (May 28, 2026).

As before: verify version-specific details against the current changelog before relying on them — the CLI ships frequently and feature-to-version mappings drift. This post will age. That’s rather the point of it.

The headline: Claude Fable 5#

Layer 2 revisited — hooks grew a feedback channel#

Layers 1 and 3 — tools for context hygiene#

Layer 5 revisited — parallel work, hardened#

Beyond the CLI: Dreaming, and more headroom#

The updated checklist#

Sources#