Which harness productivity number survives the June 1 meter?

2026-05-28

A clean executive dashboard mockup showing two stacked rows. Top row labelled 'verified merged output per engineer-month' with a green bar. Bottom row labelled 'harness cost per engineer-month' with an orange bar that has a small dotted ceiling line above it. Calendar in the corner shows the date June 1.

Four days before GitHub Copilot's AI Credits meter activates, the May 26-27 wave from BigGo, GitKraken, and GitHub's own CLI rewrites what an executive can actually put on the board slide. The harness productivity number stops being a percentage and becomes a ratio.

TLDR

Four days before GitHub Copilot's AI Credits meter activates, the May 26-27 evidence rewrites what counts as a harness productivity number on a board slide. The "percentage of AI-generated code" line is done. The number that survives June 1 is a ratio: verified merged output per engineer-month over harness cost per engineer-month, and the cost row just became real and observable.

The setup

I was reading the GitHub Copilot CLI release notes on Wednesday, mostly out of habit, and noticed something quiet. Between Tuesday afternoon and Wednesday night, GitHub shipped seven pre-releases of Copilot CLI. Not headlines. Not a blog post. Just v1.0.55-1 through v1.0.55-7. One of them added a single line that I keep thinking about: “Reasoning token count now shown in session token summary for all users.”

That is the cost row of the board slide, shipping as a tool primitive, four days before the June 1 GitHub Copilot AI Credits billing activation. The same week, BigGo Finance reported verbatim that Nvidia’s VP of applied deep learning, Bryan Catanzaro, said his team’s compute cost is now “far beyond the costs of the employees.” GitKraken Insights published the operating PR-cycle benchmarks elite teams are running against. Three signals from three angles, all pointing at the same uncomfortable arithmetic.

The board number an executive needs by their next meeting is not the one they brought to the last one.

What they tried

For most of 2025 the productivity slide was a single line. “AI-generated code, 60 percent and rising.” Sometimes “Claude Code adoption, 84 percent of engineers” (Uber’s own figure inside Praveen Naga’s 5,000-engineer organization, reported again this week in BigGo Finance’s May 26 piece). Sometimes “PRs merged per developer, up 98 percent” (the Faros AI figure that has been recycled across every coding-tool deck since the 2025 DORA report). The slide had one number. The number went up.

84%

of Uber engineers adopted Claude Code by March, per BigGo Finance, May 26

What broke is that everybody in the room learned to read it. Boards started asking the second question. The one that comes after “great, adoption is up.” That question, when I sit with founders and Series B CTOs, sounds something like this: how much are we spending to get the number to go up, and what did the company actually ship because the number went up. The first half of the question used to be unanswerable. Seat licenses were flat. Token spend was an Amex receipt that landed quarterly. Output was a vendor dashboard.

This week the first half stopped being unanswerable.

BigGo Finance’s May 26 story carried three executive voices in one piece. Bryan Catanzaro from Nvidia, on the record. Tom Blomfield from Y Combinator. Will Sommer from Gartner. They were not saying the same thing, exactly, but they were pointing at the same instrument panel. Catanzaro on his own team: compute now eclipses payroll. Blomfield, sharper, talking to founders: if the API bill does not make you wince, you are not burning hard enough. Sommer, the analyst, doing the hard work of cooling the room down: do not confuse cheaper tokens with smarter reasoning.

"Healthy median PR cycle time should be under 24 hours, with elite teams merging 50% of PRs same-day and 90% within 48 hours."

GitKraken Insights, May 27 2026

Two days after Catanzaro’s quote landed, GitKraken Insights published the other half of the math. Their elite-team benchmark for pull-request cycle time is under 24 hours median, 50 percent of PRs merging same-day, 90 percent within 48 hours, pickup time absorbing 40 to 60 percent of total cycle time. The article does not name a single AI coding tool. That is the productivity-not-marketing tell. The threshold predates the harness conversation, which means it survives any vendor’s claim.

Where it broke, where it worked

The seven Copilot CLI pre-releases between Tuesday and Wednesday are easy to miss as small ship notes. Read them together and they are the entire cost-side instrumentation for a Q3 board slide.

v1.0.55-1 on May 26 surfaced loaded extensions and their source through /env. The admin tenant can finally see what is actually inside a session. v1.0.55-3 on May 27 added the reasoning token count to the session token summary, hook progress streaming for real-time visibility into long-running operations, plugin precedence ordering (project then plugin-dir then personal then custom), and the verbatim line that an organization policy can now disable remote-controlled sessions and explain itself with a helpful message. v1.0.55-6 later that night shipped the /autopilot command, the policy warning when a remote session is blocked, per-extension log capture for diagnostics, and the fix for sessions with zero-sized CAPI billing batches resuming correctly. v1.0.55-7 closed out the run.

What worked, as a pattern, is that the cost denominator now has the same observability that the seat-license line has had for years. A CFO can read per-engineer token spend by category against the verified-output rate without a custom-built dashboard. That was not true on Monday. It is true on Wednesday.

What broke is the slide an executive brought to the last board meeting. The “percentage of AI-generated code” line is a vendor-side metric. It is loud, it is rising, and it is uncorrelated with what the company shipped. Port.io’s analysis of 63 earnings calls earlier this month found 21 companies disclosed AI-generated code percentages and zero connected the percentage to an engineering outcome. That was the warning shot. June 1 is the meeting after the warning shot.

The hidden executive need this week is the one nobody puts on a slide. It is the “I need to sound credible with my CTO and my board in the same day” need. A CTO can read the v1.0.55-3 release notes and immediately understand that reasoning token count per session is the missing primitive. A board member cannot, and does not need to. What they need is the same number, in a form they can act on, without either of them losing face.

The pattern

Five events in one 72-hour window, one mechanism underneath them. Every signal reframed the harness productivity question from a single percentage into a two-row ratio.

Row one is the cost denominator. It became real and observable at the admin tenant this week. The Copilot CLI v1.0.55-3 reasoning token count surface is what makes it observable. The June 1 AI Credits activation, four days out, is what makes it real. GitHub’s own April 27 announcement spelled it out: under the new model, Copilot Business stays at 19 dollars per user per month with 19 dollars in monthly AI Credits, Enterprise stays at 39 with 39 in credits, and the variable spend above that is metered at published API rates. Mario Rodriguez, GitHub’s CPO, said it plainly the same day: “A quick chat question and a multi-hour autonomous coding session can cost the user the same amount” under the old PRU model, and that ends on Monday. The preview-billing tool that has been running through May has already shown one developer’s 39 dollar PRU bill come back at 902 dollars under the metered model.

Row two is the verified output numerator. GitKraken’s May 27 benchmarks give it a defensible threshold. Median PR cycle time under 24 hours, 50 percent merging same-day, 90 percent within 48 hours. The article body does not name an AI tool. That is the feature, not a gap. The threshold predates the harness conversation, which means it is the right denominator for a productivity claim that has to survive a CFO conversation.

The board number an executive needs by their next meeting is verified merged output per engineer-month divided by harness cost per engineer-month. The cost row stopped being a surprise on Monday.

Key Insight

"AI-generated code share" was a one-number slide that worked while adoption was the story. Once the meter starts, the story is the ratio. Output the company actually shipped, over what the harness cost to ship it. Both rows are now instrumented. The only missing piece is a per-engineer ceiling.

The thing I keep thinking about is the Tom Blomfield line from the BigGo piece. The Y Combinator framing is right for an AI-native seed-stage team, where burning hard is the design intent and the company has six engineers and no payroll problem. It is wrong, transposed verbatim, for an enterprise CTO who has 200 engineers, a 12-quarter budget, and a CFO who reads variable spend as a flashing red light. The Will Sommer line from the same article is the corrective. Cheaper tokens are not the same thing as more reasoning. A 902-dollar bill from a model that produced 39 dollars of merged-and-shipped value is the new failure mode the slide has to catch.

What I would tell you over coffee

If I were sitting with a Series B CEO this morning, four days before the meter starts, I would tell them three things.

One. Decompose the harness line on next month’s board slide into three sub-lines. Per-seat platform fee, the unchanged piece. Admin-controlled token allowance plus the overage rate, the piece that becomes variable on Monday. Governance-plane cost, which is the audit log, the sandbox infrastructure, and the MCP allowlist administration the v1.0.55-3 wave just made operable. Pretending the harness line is one number is the slide that breaks on the CFO’s first follow-up.

Two. Pick one verified-output metric the CFO can stamp. “Merged-and-shipped PRs per engineer per week at under a 24-hour median cycle time” is the one I would use. It lives in the CI/CD merge log. It does not depend on a vendor dashboard. The GitKraken May 27 piece gives the threshold without naming a tool, which is exactly what makes it defensible.

Three. Name a per-engineer monthly cost ceiling before Monday. If a heavy user crosses it twice in a quarter, that triggers a conversation, not an invoice surprise. The v1.0.55-3 reasoning-token-count primitive is the data you will enforce against. You do not need a custom dashboard yet. You need a number.

The productivity question is not going away. It is getting better. The shape of the answer is a ratio, both rows are now instrumented, and the only thing missing from most companies’ slides is the discipline to write the ratio down before the meter writes it for them.

Sources

Microsoft Pulls the Plug on Claude Code as AI Token Bills Eclipse Employee Costs - BigGo Finance, 2026-05-26
Healthy PR Lifecycle Time Benchmarks & Targets for 2026 - GitKraken Insights, 2026-05-27
AI News Today - May 27, 2026: 12 Biggest Stories - BuildFastWithAI, 2026-05-27
GitHub Copilot CLI releases v1.0.55-1 through v1.0.55-7 - GitHub, 2026-05-27
GitHub Copilot is moving to usage-based billing - The GitHub Blog, 2026-04-27

Back to all insights