Before the Q3 board: is your AI agent actually paying for itself?

A boardroom budget slide on a screen showing a steeply rising AI usage chart beside a nearly empty net-value column, with a calculator and a printed invoice on the table.

Uber burned its 2026 AI budget in four months, capped tools at $1,500, and admitted it could not link the bill to features shipped. Here is how to tell whether a Series B AI agent is net-positive before the Q3 board, by counting outcomes and the cost to supervise, not consumption.

TLDR

Uber burned its entire 2026 AI budget in four months, then capped spending at $1,500 per tool and admitted it could not yet connect the bill to features shipped. That is the real question heading into Q3 budgets: not how much an agent gets used, but whether it is net-positive once the cost of running and supervising it is on the page. Count outcomes, name an owner, and the board conversation gets a lot calmer.

Last week Quartz pulled together a story a lot of operators have been living quietly. Uber, a company that told its engineers to use AI as freely as they wanted, blew through its entire 2026 AI budget in four months. The response was not a ban. It was a $1,500 monthly cap per employee, per tool, tracked on an internal dashboard, with an approval step for anyone who needs more. Use both Cursor and Claude Code? That is $1,500 each. Individual engineers were running monthly bills between $500 and $2,000 in token consumption.

Microsoft made a quieter version of the same call. Its Experiences and Devices division is phasing out Claude Code licenses by the end of June and pointing engineers toward GitHub Copilot CLI instead.

$1,500
Uber's new monthly cap per employee, per AI coding tool, after a four-month budget burn

Here is the part that should land in a Series B founder’s chest. The interesting thing is not the cap. It is what Uber’s president and COO Andrew Macdonald said when asked whether all that spend was worth it.


What “measuring AI” has quietly meant: counting consumption

For two years, the way most teams measured AI was by counting consumption. Tokens. Prompts. Seats. Monthly active users. The dashboard that goes up and to the right. And honestly, that made sense when the only question on the table was adoption. If nobody was using the thing, nothing else mattered, so usage was a fair proxy for progress.

The problem is that usage stopped being the question somewhere around the start of this year, and a lot of measurement never caught up.

A Futurum survey from the spring, across 830 enterprise software decision-makers, caught the shift in motion. Productivity gains, the default justification for AI spend through 2024 and 2025, fell from 23.8% to 18.0% as the metric leaders trust most. Direct financial impact, meaning actual revenue and profit, nearly doubled to 21.7%. The people writing the checks are quietly moving from “are people using it” to “did it show up in the P&L.”

That gap is where ROI goes to die. As IT Pro put it this month, looking at the Uber bill, enterprises are “still measuring AI success through consumption rather than outcomes,” and it is warping how the whole industry reads its own returns. A usage chart that climbs every week is the most comforting lie in enterprise software. It feels like momentum. It is just a number that grows when people type.

Key Insight

Consumption is easy to count and always trends up. Outcomes are messy, lagging, and hard to attribute. So teams measure the easy thing and call it ROI, which is how spend can triple while the value question stays unanswered.


The line nobody puts on the board slide: the cost to supervise

When a founder tells me an agent is paying for itself, I ask one question. What is the all-in number, including the human time to watch it?

The API bill is the visible cost. It shows up, it has a dollar sign, it is impossible to ignore once it hits $2,000 a head. The invisible cost is supervision: the senior engineer reviewing output, correcting it, re-running the failed steps, and catching the silent failures that look fine until they do not. An agent that saves a team ten hours but needs six hours of senior review and an eighteen-hundred-dollar token bill is a real number. It is just a very different number than the one in the demo.

This is the math Bain put numbers behind at the start of this month, across 951 firms. Budgets are up. Returns are mostly absent. Their line for it was blunt: the technology worked, the value did not arrive. Nearly 40% of firms measured savings below 10% of what they targeted. Only 4% cleared 30%. And 44% are funding their next wave of AI spend on savings the previous wave never actually delivered, which Bain rightly called a circular bet with a structural leak.

Where measured AI savings actually landed (Bain, 951 firms, June 2026)
Result vs. targetShare of firms
Measured savings below 10%~40%
Measured savings above 30%4%

None of this means the agents are bad. It means the board slide is incomplete. The board does not want the usage chart. It wants the net number, with the supervision cost subtracted, and it wants to know that net number is honest.


Why consumption looks like progress and outcomes look like nothing

Macdonald, asked to connect Uber’s rising token bill to what it produced, was refreshingly honest. He said the link “is not there yet.”

"It's very hard to draw a line between one of those stats and, okay, now we're actually producing like 25% more useful consumer features."

Andrew Macdonald, Uber president and COO, as reported by IT Pro, June 2026

Read that twice, because it is one of the most useful things an operator at that scale has said out loud all year. A company spending enough to exhaust an annual budget in a quarter still cannot draw a clean line from consumption to shipped value. That is not an Uber failure. That is the default state of almost everyone right now, just usually unsaid.

The fix is not more tooling. It is sequencing. This week CVS Health, running one of the largest pharmacy networks in the country, partnered with a platform called Fluency to move agents from pilot into production with outcome measurement built into the lifecycle rather than bolted on at the end. Their framing stuck with me. Most enterprises, Fluency’s CEO said, do not have an AI access problem. They have a prioritization problem.

A company spending enough to burn an annual budget in a quarter still could not connect the bill to the value. That is not the exception. That is the default, just usually unspoken.


What I’d put on one line before the Q3 board

Here is what I would do before the next board meeting, and it takes an afternoon, not a consulting engagement.

For each agent in production, write one line: what it does, the honest all-in monthly cost including supervision, the single outcome it moved that you can defend, and the name of the person who owns it. If a line has a cost but no defensible outcome, that is not a failing agent. That is a measurement gap, and naming it is the most credible thing a founder can do in that room. Boards have stopped being impressed by adoption. They are reassured by someone who can say, calmly, which agents are net-positive, which are still pilots, and how they can tell the difference.

The companies pulling ahead are not the ones spending the most. They are the ones who stopped counting consumption and started counting outcomes, one named line at a time. That is not a harder job than the one already being done. It is the same work, pointed at the number that actually matters.

Sources

  1. After the AI spending binge: Caps, dashboards, and the search for ROI - Quartz, 2026-06-11
  2. Uber's eye-watering AI bill shows enterprises are 'still measuring AI success through consumption rather than outcomes' - IT Pro, 2026-06-05
  3. CVS is Using Fluency to Close the Enterprise AI Deployment Gap - Yahoo Finance, 2026-06-11
  4. Your AI Budget Is Growing. Your Returns Aren't. Here's Why - Bain & Company, 2026-06-01
  5. Enterprise AI ROI Shifts as Agentic Priorities Surge - Futurum Group, 2026-04-26

Back to all insights