Where Coding-Agent ROI Actually Shows Up First (And Where It Quietly Doesn't)

Where Coding-Agent ROI Actually Shows Up First (And Where It Quietly Doesn't)

The 3x productivity question every board is asking is real in some places and invisible in others. Coding agents earn their keep on the work that never got done before, not the PR that was already shipping. Here is how to find the number that will actually hold up in a board meeting.

TLDR

The "3x productivity" question every board is asking is real in some places and invisible in others. Coding agents do not earn their keep on the critical-path PR that was already going to ship. They earn it on the work that never got done before. Stop measuring velocity. Start counting the work that suddenly exists.

The board-prep question

Last week I was on a call with a Series B founder who had twenty minutes to respond to his chairman. The chairman had forwarded him a McKinsey note and asked one thing: “If these tools are 3x, why aren’t we shipping 3x?”

This is the question of the season. Every operator I talk to is getting a version of it. Most are answering it wrong, because they are answering the wrong question. The 3x number exists. It is not a hallucination. It is also almost never where the board thinks it is.

Before the pattern, a quick snapshot of where the market actually landed this quarter. In Gergely Orosz’s March survey of nearly 900 engineers for the Pragmatic Engineer newsletter, 56% of respondents now do 70% or more of their work using AI, and Claude Code went from zero to the most-used tool in the category in eight months. Adoption is no longer the interesting number. What the adoption produces is.

What teams actually try first

When the productivity question lands, the natural instinct is to measure the thing the tool does, which is write code. So teams reach for the metrics they already have. Lines shipped. PRs merged per engineer. Cycle time. DORA dashboards dusted off.

I watched three teams run this play in the last two months. One instrumented PR throughput before and after a Cursor rollout. One A/B tested engineers with and without Claude Code for a quarter. One did the most honest thing, which was asking engineers how much time they felt they were saving.

All three found something, and none of the somethings matched the executive expectation. The self-reported number averaged about 3.6 hours per engineer per week of time saved, which is real but is not 3x anything. The throughput number rose, but the review queue rose faster. And the A/B test was quietly contaminated by the fact that the control group kept sneaking Claude Code onto their personal laptops, which is a very 2026 problem.

~60%
more PRs merged by daily AI users than light users, per April 2026 benchmarks

None of those measurements are wrong. They are pointed at the wrong part of the organization.


Where it breaks, and where it quietly works

The productivity paradox is structural. When the code-writing step speeds up in a system where code-writing was not the bottleneck, the obvious thing happens. Larridin’s March 2026 benchmarks put AI-assisted PR cycle times at 12 to 18 hours for top-quartile teams versus 18 to 24 for human-only work, which is real acceleration on the writing side. But AI-generated PRs also tend to sit longer in review, and early data shows roughly 1.7x more issues per PR when an agent co-authored it. Code got faster. The queue after the code got slower.

So where does the gain actually live? In the corners of the engineering org that used to be ignored because the opportunity cost was too high.

Every team I have watched in the last six months has the same list. They finally ship the backlog tickets labeled “nice to have” since Q2. They fix the flaky tests that have been yellow for a year. They write the onboarding runbook that every new hire asked about and nobody ever wrote. They pay down the migration that was blocking three other migrations. They update the README that was wrong in two places.

"Daily AI users merge ~60% more PRs than light users."

GetPanto AI Coding Statistics, April 2026

That number looks like pure velocity. Look closer. It is not the same PRs shipped 60% faster. It is 60% more PRs appearing, and most of them are the work that was never getting done at all.


The pattern

Here is the pattern I would put in front of the board, and it is the one the Series B founder ended up using.

Coding agents do not make the critical path faster. The critical path was already the fastest thing in the org, because that is where the best engineers, the tightest process, and the most attention were already concentrated. The ceiling on the critical path was never the code. It was review capacity, scope clarity, and a willingness to cut the thing.

What agents do is raise the floor. They take the work that used to sit beneath the line of “worth an engineer’s time” and drag it above the line. The second PR a single engineer ships in a day. The ticket that would have stayed in the backlog forever. The internal tool that paid for itself in week two. The small rewrite nobody could justify asking a staff engineer to own.

Key Insight

ROI from coding agents shows up first in work that was not getting done before, not in work that was already getting done. The right metric is not velocity. It is the new category of shipped work that did not exist on the roadmap last quarter.

This is a boring insight that makes most boards quietly recalculate their whole thesis, and it is far closer to the truth than 3x.

Agents do not make the critical path faster. They raise the floor on what is worth doing at all.

What I’d tell you over coffee

If the board wants a number, stop trying to prove 3x on the critical path. The proof is not there, and six weeks of dashboard instrumentation will only confirm it is not there.

Try this instead. Pull the last two quarters of shipped work, and sort it by whether the category of work existed on the roadmap before coding agents entered the workflow. Count the new category. That is the number. It will not be 3x. It will be something less sexy and more durable, like “we shipped 40% more internal tooling than any quarter on record, and our incident rate on new features is flat.” That graph holds up in a board meeting better than any velocity chart, because it describes something the company did not used to be able to do.

The productivity question is not wrong. It is aimed at the part of the org that was already running at speed. Aim it at the long tail. That is where the money is.

Sources

  1. AI Tooling for Software Engineers in 2026 - The Pragmatic Engineer, 2026-03-03
  2. Developer Productivity Benchmarks 2026 - Larridin, 2026-03-19
  3. AI Coding Statistics: Adoption, Productivity & Market Metrics - GetPanto, 2026-04-08

Back to all insights