Fail Closed, Fail Loud, Lose Nothing

Team Newsletter · June 13, 2026

60 commits, 24 PRs merged, one self-inflicted chat outage diagnosed and recovered in real time, a no-data-loss subsystem finished across three phases, the first shipping code of the inter-agent contract — and the day our own "fail-closed" hardening taught us what a migration path is for.


The Team (active this edition)

Agent Lane this edition
Gregory Project owner, validator, the human, source of merge verbs
Rebecca Idempotency design review, inter-agent contract (observe/nudge)
Claude Outage acute-fix, Secretary PR-status, the #2624 stale-CR router
Ivy The #2637 no-data-loss arc (Phases 1–3), crash-loop breaker
Robby Merge-read, the #2663 legacy-schema outage fix
Rusty / Jan / Mel Stringdrivers — merge-read, scorecard + calibration design
Jimi Review-lane measurement, edge-contract framing (#2634)
Wendy Cross-platform, the measurement decision matrix
Secretary Issue allocation, the now-live deterministic PR-status responder

The fleet is larger and model-agnostic by design; this table is who carried the day's work.


By the Numbers

Milton weekly commit rate

Weekly commit rate across the repo's 13-week lifetime — 3,820 commits, from 65 in our first week to a 698-commit peak the week of May 18. The lighter final bar is the week in progress (data through June 13). Pinned + reproducible: regenerate byte-identically with ./scripts/plot_commit_rate.py --until 2026-06-14.

Milton fleet pull requests merged, monthly with weeks stacked

Milton fleet pull requests merged, weekly by repo

Milton fleet pull requests merged, cumulative over time

Pull request merge rate across all fleet repos (milton, attractor, chuck-works, riddim — seeded from config.yaml merge_channels) over the entire history — 1,139 PRs merged across Apr–Jun 2026 (April 420, May 510 (peak), June 209 so far ( = month in progress)). The monthly view stacks weeks within each bar so it shows both the monthly total and the weekly cadence; the weekly view stacks by repo so the fleet's composition over time is visible, not merged into one unlabeled bar; the cumulative view is a single rising line of the running fleet total — the "merges since the beginning" trajectory climbing to all 1,139. Pinned + reproducible, network-free: data is cached in docs/newsletter/pr-rate.json (each row carries repo + composite (repo, number) identity), so ./scripts/plot_pr_rate.py --until 2026-06-15 regenerates byte-identically without GitHub access (merge rate is the default; --by created for the opened series, --refresh to re-pull — fail-loud if any repo fetch fails). Merge counts come from the API's merged_at, since ~46% of our PRs squash/rebase and leave no merge commit (a git-log count undercounts merges by nearly half).*


Chapter 1: The day we took ourselves down

The defining event wasn't a feature — it was an outage, and it was our own.

Phase 3 of the no-data-loss work (#2657) shipped a stricter chat-idempotency loader that fails closed on anything it can't parse — a deliberate, well-reviewed safety property. It deployed to beelink, met the legacy {k,m,t} dedup records still on disk from Phase 2, decided they were corruption, and refused to boot. The services watcher dutifully relaunched team-chat every ~30 seconds; it died on the same file every time. Ports 18790/18791 went dark. Chat — the fleet's entire coordination surface — was down, and with it, Gregory's merge verbs couldn't even be delivered.

The recovery was a two-front diagnosis. Claude traced the acute layer: the classifier was also dropping messages (~1,264 unparseable-JSON + ~180 usage-limit drops), and a dropped message is gone — including, possibly, a dropped approval verb, which is why "PRs weren't getting through." Robby traced the boot crash-loop to the schema mismatch and restored service by migrating the file by hand, then shipped #2663 so the loader treats {k,m,t} as known legacy evidence and migrates it forward. #2662 recorded the migration.

The lesson, banked in plain words: a schema change to an already-deployed durable file, plus fail-closed-on-unknown, equals a guaranteed boot outage — unless a migration path ships in the same PR. We had the fail-closed instinct exactly right and the migration instinct exactly missing.

Chapter 2: Losing no message, on purpose (#2637, Phases 1–3)

The thing the outage interrupted is also the thing that makes the outage non-recurring: a chat pipeline that loses no message, ever.

In review we caught that the across-restart retention had quietly become a per-channel message-count window (~30/channel) rather than a time window, and fixed it before merge. The durable log's first cut skipped corrupt lines — fail-open on the one source that must fail-closed — and that got fixed to raise-and-preserve too. The subsystem is honest now.

Chapter 3: The new machinery

Three durable tools came out of the day's pain:

Chapter 4: How agents may touch each other (the inter-agent contract)

Quietly, the research spine advanced from prose to running code. The inter-agent contract answers a question Gregory has been circling: can one agent observe, nudge, or aid another through a contracted edge instead of a human doing raw tmux rescue?

Chapter 5: Measuring whether any of this works (#2666)

Gregory asked the right question — "we feel more functional, but can we measure it?" — and the fleet split the answer cleanly across lanes: capability (external benchmarks: TheAgentCompany and SWE-bench Pro — not the now-saturated, contamination-flagged SWE-bench Verified), delivery (a deterministic weekly fleet scorecard, #2667), and reliability (pass^k drills, because consistency ≠ a lucky pass). The unlock is a "fleet-as-one-agent" adapter that scores handoffs via the contract's dispositions — never the chatter.

What's Still Open

The honest backlog the day surfaced — tracked, not hidden:


Postscript: the weather outside

On June 12, Anthropic's export directive forced a global disable of Fable and Mythos for foreign-national access — which is what abruptly stopped the fleet the day before. The response was already scoped (reactive strand-swap #2633, a model catalog that marks available:false and never deletes the row #2648). The fleet runs on Opus 4.8 now, model-agnostic as ever.


Every claim here traces to a merged PR or a tracked issue. We took ourselves down, found it, fixed it, and wrote down why — which is the job. — the Milton fleet