The Unattended Coding Agent Just Arrived. Four Gates Before Letting It Commit.

Anthropic's Claude Code Routines now run with no mid-run approval prompts. NVIDIA has 10,000+ employees on GPT-5.5 Codex on day one. The unattended-commit dial moved across every major harness this week. Here are four gates to set before flipping it.
Anthropic's Claude Code Routines now run as full autonomous cloud sessions with no permission-mode picker and no mid-run approval prompts. NVIDIA has more than 10,000 employees on GPT-5.5-powered Codex three days into the launch. The unattended-commit dial moved across every major harness in 72 hours, and most engineering orgs have not yet decided what gates they want in front of it.
Problem this solves
I spent Wednesday morning reading NVIDIA’s April 23 blog post about its GPT-5.5-powered Codex rollout. The line that stuck out was the headcount.
"Over 10,000 NVIDIANs - across engineering, product, legal, marketing, finance, sales, HR, operations and developer programs - are already using GPT-5.5-powered Codex."
That is day-of-launch usage. Not week-three. The same day, Anthropic’s Claude Code Routines hit a maturity point where, as allthings.how put it, “Claude Code runs the routine as a full autonomous cloud session. There’s no permission-mode picker and no mid-run approval prompts.”
The unattended-commit boundary moved across every major harness this week. Routines keep working with the laptop closed. Codex parallel agents run in the background while engineers do something else. The question shifted from “should we ever let an agent commit unattended” to “what gates do we want in front of the dial that has already turned.”
Most engineering orgs I have spoken to have not answered that question in writing yet.
The approach: four gates
When I sit down with a CTO this week, here is the framework I keep recommending. Four gates, in order, before any harness gets to push commits without a human in the loop.
Gate 1: Scope the action surface. A routine is not a script. allthings.how puts it cleanly: routines are “saved cloud configurations that pair a prompt, one or more GitHub repositories, and a set of connectors into a package that runs on its own.” Decide which routines are allowed to push to a branch, which can open a PR, which can merge, which can deploy. The default for most cloud agent products today is “everything the OAuth scope permits.” That is too wide. Map the action surface explicitly, per routine, per repo, per branch-protection class.
Gate 2: Constrain the credential blast radius. I keep thinking about last week’s Vercel breach, where one OAuth grant on one consumer AI tool gave attackers access to environment variables across the company. An unattended agent has a credential surface too. Before turning on autonomous commits, audit every secret the routine can reach: GitHub tokens, NPM tokens, deployment tokens, MCP connector scopes. Then split them by trust tier. The agent that triages Dependabot updates does not need the same scope as the agent that touches a billing service.
Gate 3: Require deterministic verification, not just CI. Antithesis came out the same day with a press release that contained the most honest sentence I read all week.
"It can take mere seconds for AI to generate code, but days or weeks to review, test, verify, and build trust in it."
CI passing means the diff did not break tests written before the diff existed. It does not mean the diff is correct. Set a deterministic verification step: property tests, simulator runs, formal checks on the modules that matter. If a team cannot articulate what “verified” means in one sentence per repo, do not enable unattended commits in that repo yet.
Gate 4: Set the kill switch the team will actually use. Every autonomy framework I have ever seen has a “pause runs” button. Almost no engineering org has a runbook for when to press it. Define the trigger now. Three failed verifications in 24 hours. A spike in revert rate. A specific class of file touched. Write the trigger, write who decides, write how the engineering team gets notified, and rehearse it once before any routine goes live.
Why most teams get this wrong
The most common mistake right now is treating the unattended dial as a binary. It is not. There are at least four sliders: action surface, credential scope, verification depth, and kill-switch threshold. Move one without thinking about the others and the failure mode is cumulative, not isolated.
The second mistake is conflating model, harness, and runtime. The model writes the diff. The harness gives the model tools. The runtime executes the commit. GPT-5.5 hit the market with what Handy AI’s Jake Handy bluntly described as a hallucination rate that is “insane” compared with Opus 4.7. That is a model-layer concern. The runtime decides whether a hallucinated diff ever reaches main. A weaker model with a good runtime and good gates can ship safely. A frontier model with a wide-open runtime cannot.
The model decides what gets generated. The runtime decides what gets shipped. Engineering orgs that confuse the two end up arguing about benchmarks while their commit policy is the actual risk surface.
The third mistake is decision-deferral disguised as caution. “We are not turning on Routines yet” is a fine answer on Monday. By Friday, three engineers are using their personal Pro subscriptions in the same way the Vercel employee used Context.ai, and the org has an unmanaged surface instead of a managed one. Either pick gates or accept that the gates will pick themselves.
The numbers
Three data points worth holding in mind this week.
Anthropic’s Claude Code Routines: zero mid-run approval prompts by design.
Antithesis pricing the verification gap in its own words: seconds to generate, days to weeks to verify.
Three numbers, same story from three angles. Capability moved. Friction at runtime moved. Verification has not.
Ship it
-
Pick one repo this week
Not the customer-facing one. Pick the internal-tooling repo where a bad commit costs time, not revenue.
-
Define the four gates in writing
Action surface, credential scope, verification depth, kill-switch threshold. One paragraph each. Share it with the security and platform leads before turning anything on.
-
Turn on one routine and watch for fourteen days
Track revert rate, verification overhead, and incident count. If the numbers stay flat, widen the surface. If they move, you found out cheaply.
That is how I would do it if I were sitting in a CTO chair right now. The dial moved this week whether anyone in the org chose it or not. The only real choice left is whether the gates get written before the routines do, or after.
Sources
- OpenAI's New GPT-5.5 Powers Codex on NVIDIA Infrastructure - and NVIDIA Is Already Putting It to Work - NVIDIA Blog, 2026-04-23
- Claude Code Routines: How Anthropic's Scheduled AI Agents Work - allthings.how, 2026-04-23
- GPT-5.5 Just Shipped April 2026 and OpenAI Finally Built a Real Agent - RoboRhythms, 2026-04-23
- Model Drop: GPT-5.5 - Handy AI, 2026-04-23
- Antithesis Teaches AIs To Correct Their Own Output - GlobeNewswire, 2026-04-23
- Need to Know News April 25, 2026 - The AI Marketers, 2026-04-25