The Number Nobody Puts in the Series B Deck: What AI Actually Costs When It Starts Working

The Number Nobody Puts in the Series B Deck: What AI Actually Costs When It Starts Working

Most Series B companies built their AI cost model during the pilot. This is what the bill actually looks like when AI starts working at production scale.

TLDR

Most Series B companies built their AI cost model during the pilot, when usage was low and the infrastructure team was two engineers. The production bill is structurally different. Signals this week from Oracle, IBM, and new enterprise spending research all point to the same thing: the gap between pilot economics and production economics is wider than most teams expect, and the time to model it is before committing to an architecture.

The Setup

Oracle cut approximately 30,000 people in March 2026. That is 18% of its global workforce. The company described this as a restructuring to fund a $156 billion AI infrastructure investment. IBM, in the same week, announced a partnership with Arm to build new hardware platforms specifically for enterprise AI workloads, targeting regulated industries like banking, insurance, and telecom.

I find these two announcements more useful than any market forecast I have read this quarter. Not because they are surprising, but because of what they reveal. Both companies are not adding AI onto existing infrastructure. They are restructuring the infrastructure layer itself to handle AI natively, at production scale, under real operational load.

For a Series B founder who has shipped something that works, this is the chapter most pitch decks skip entirely.


What They Tried

Here is the story I hear from most teams at Series B.

The pilot runs on a hosted model API. The engineering team is small, usage is low, and the monthly inference bill lands somewhere between $300 and $800. The demo goes well. The board approves the next phase. The founders do the multiplication: at 1,000 users, this costs roughly ten times what it costs now. The math checks out. The runway covers it. The decision gets made.

That math is not wrong. It is just modeling the wrong costs.

During a pilot, the infrastructure is borrowed. The data pipeline runs manually because it is too early to automate. The compliance review takes one afternoon because nobody has asked hard questions yet. Engineering time is absorbed in base salaries already on the payroll. None of this appears in the inference bill. All of it appears at production scale.

I keep seeing teams get surprised by three specific categories. First: data engineering and retraining cycles, which scale with request volume and data freshness requirements, not just with user count. Second: governance and compliance overhead, which surfaces suddenly when the first enterprise customer sends a vendor questionnaire with 40 AI-specific questions. Third: the agentic cost multiplier. Most teams building in 2026 are building agentic workflows, where the AI takes multiple sequential steps, calls tools, reads from and writes to systems of record. Those workflows are not proportionally more expensive than simpler models. They are categorically more expensive. Each step in an agentic chain adds inference calls, latency requirements, and error surface. A task requiring fifteen model calls costs fifteen times what a single query costs, at minimum. Pilots almost never involve workflows that complex.


Where It Broke (and Where It Worked)

A report published this week by Simply Wall St analyzed enterprise AI operational spending globally. The finding: organizations worldwide are underestimating their AI operational costs by 30% or more. The cause is consistent across industries. Companies base their financial models on what AI costs during pilots, not what it costs to run AI as a production system with real volume, real data freshness requirements, and real enterprise accountability standards.

30%+
how much enterprises globally are underestimating their AI operational costs, per research published April 3, 2026

That 30% figure likely understates the gap for teams running agentic workflows, where the architecture itself generates cost categories that have no pilot equivalent.

SiliconAngle published a sharp analysis this week on why AI cost management tools are failing to contain the spending problem. The argument: better dashboards and tighter governance do not fix a cost structure that was designed for the wrong workload. The piece cited PricewaterhouseCoopers data worth sitting with.

"Some 55% of respondents to a recent PricewaterhouseCoopers International Ltd. survey say they have yet to see any benefit from artificial intelligence tools."

SiliconAngle, April 3, 2026

That 55% is not evidence that AI does not work. It is a reliable signal that most organizations are paying for production-scale infrastructure while still operating at pilot-scale readiness. The infrastructure scaled. The ability to capture value from it did not.

The teams that get this right share one pattern. They model the production cost structure before committing to the architecture. Not after the pilot succeeds. Before. They ask: what does this system cost at 2,000 users running agentic workflows? What does the data pipeline look like at production volume? What happens to compliance overhead when the first enterprise contract goes live? Teams that ask those questions before building tend to make different architecture decisions, and consistently land closer to budget.

IBM’s Arm partnership is worth reading as signal, not just vendor news. It targets regulated industries specifically because those are the environments where AI workloads have already moved into production in volume. The challenge IBM and Arm are solving is not model capability. It is reliability, security, and flexibility under sustained operational load. That is a different engineering problem from pilot performance, and the companies already in production have learned to treat it as such.


The Pattern

Here is what I see consistently across companies that have gone through this transition.

The pilot creates a mental model. That model prices AI as an API with variable costs proportional to usage. It is accurate for query-and-response AI. It significantly underprices agentic AI, which creates cost categories that have no pilot analogue.

The second thing that consistently catches teams is the gap between what AI costs to run and what it costs to run well. Running AI in production at the standards enterprise customers require, with the uptime guarantees, audit trails, and governance documentation those contracts specify, costs more than running AI in production at pilot standards. Both get called “production.” They are not the same economic reality.

The infrastructure bill at Series B is not an arithmetic problem. It is a design problem. The time to solve it is before committing to the design.

Oracle cut 18% of its workforce to fund infrastructure. IBM is rebuilding its hardware platforms. These are not companies running pilots. They are companies that have already absorbed the lesson about what AI costs at scale, and are restructuring around it. What these moves signal for founders earlier in the journey is that the infrastructure layer is still being rebuilt, and building a cost model on what exists today rather than what it will take to run at target scale is a risk worth pricing in.


What I’d Tell You Over Coffee

Most Series B founders I talk to are in one of two places.

The first group has not modeled production infrastructure costs yet, and is running on the assumption that inference prices will keep falling. They will. They will not fall fast enough to compensate for the volume and architectural complexity of an agentic system serving enterprise customers with real SLAs.

The second group recently received the first real bill and is treating it as a crisis. Usually it is not. It is a design question wearing a budget problem’s clothes. The right question is whether the cost is structural or operational. Structural: the architecture makes 20 model calls per user action by design. That is not a budgeting failure, it is a design choice that needs revisiting. Operational: the system is paying for compute capacity it is not using, or moving data across regions when it does not need to. Those are different problems with different fixes.

The number nobody puts in the Series B deck is the production infrastructure cost at target scale. Not the pilot cost multiplied by user growth. The actual cost of running the system planned for building, the way enterprise customers will require it to run, at the scale the round is funding toward.

That number can be modeled. Building before modeling it is the Series B trap worth avoiding.

Sources

  1. Cloud AI Update - Hidden Costs Surge In Global Enterprise AI Operations - Simply Wall St, 2026-04-03
  2. You can't FinOps your way out of AI cloud costs - SiliconAngle, 2026-04-03
  3. AI News Digest, April 2: The Reskilling Crisis Nobody's Fixing - Asanify, 2026-04-02
  4. IBM Partners with Arm to Build Dual-Architecture Platforms for Enterprise AI and Data Processing - CXO VOICE, 2026-04-02

Back to all insights