Adoption hit 84 percent. Trust in the output sits at 29 percent.

One stat, one survey, two numbers 55 points apart. The same trust-vs-adoption pattern shows up at deployment scale, where 85 percent of enterprises run agent pilots while only 5 percent have agents in production, and that is why the Q3 harness renewal will not look like the Q1 one.

55 points The renewal gap between AI tool usage (84%) and trust in the output (29%). harness-trust-gap-adoption

Abstract geometric 3D render: an interlocking lattice of glassy navy polyhedra arranged in a flowing wave, with delicate gold filaments threading between them and a few nodes glowing softly with warm light. Represents the coding-agent harness as a connected workflow architecture.

A meta-view of Cerevisor coding-agent and harness posts (29 posts, 2026-04-14 to 2026-05-13). Last updated 2026-05-14. Next refresh 2026-06-12.

Fact-check: approved_with_patches. Dual-verified by source hunter + math auditor. 3 consensus flag(s), 9 patch(es) applied.

The shift: from Seat-priced IDE plugin to Platform, verifier, governance line items

In 30 days the harness stopped being a tool and became a platform line item, splitting the renewal contract into platform, verifier, and governance.

Through April and May 2026 the coding-agent market quietly exited its pick-a-favorite phase. Anthropic, Cursor, GitHub, and Microsoft converged on the same product shape: a platform plane holding the seat-equivalent feature set, a metered verifier line that flexes with merged PRs, and a governance line paying for a kill switch. Microsoft Agent 365 GA at 15 dollars per user, Sierra's 15 billion mark at a 100x ARR multiple, the SpaceX 60 billion option on Cursor, and Anthropic and OpenAI both importing Palantir's forward-deployed-engineer model all priced the same shift inside the same month. Value is migrating up a tier. Rules, skills, MCP servers, and subagents are commoditizing onto the agentskills.io open standard with 30-plus tools adopting it, while orchestration glue and the governance plane are where lock-in now lives. The model is no longer the unit of risk: Endor Labs measured a 26-point functional-correctness swing on the same GPT-5.5 wrapped in Cursor versus Codex. The seat line on next quarter's contract is being back-formed from a product it no longer matches.

Early April artifacts

Mid May artifacts

Data signals

  • 26 points - Same model, different harness, 26-point correctness swing. Endor Labs Agent Security League measured a 26-point functional-correctness gap between GPT-5.5 in Cursor and GPT-5.5 in Codex. Wrapper-induced variance now exceeds typical model-to-model variance, so risk registers that name only the model have stopped describing the system they claim to govern.
  • 85% / 5% - 85% of enterprises pilot agents. 5% run them in production. The harness market is verification-constrained, not adoption-constrained. The 80-point gap (RSAC 2026 stage data) echoes the 55-point usage-vs-trust gap at deployment scale, though the two figures come from different surveys.
  • 48% - Nearly half the corpus (14 of 29 posts) references Anthropic or Claude Code. Vendor concentration is itself a risk signal. A Claude Code degradation that took 38 days to surface (AMD's 6,852-session audit) moves the dashboard disproportionately.

Unresolved tensions

  • Unattended autonomy versus auditable verification: Vendors sell autonomy: Claude Code Routines, Cursor Cloud Agents, and Devin ship background commits with no mid-run approval prompts. vs Buyers staff verification: AMD reads 6,852 session logs by hand, IBM staffs morning queues, and the four-gate playbook refuses unattended commits.. The runtime decides what ships, not the model. The harness market is selling autonomy faster than engineering orgs can build the verification surface that makes autonomy safe. Evidence: harness-four-gates-unattended-commits, harness-whos-checking-productivity-number, morning-review-queue-org-chart.
  • Replacement narrative versus augmentation deployment: Cognition at $25B and Sierra at $15B are priced on AI-replaces-engineers; Coinbase, Freshworks, and Armstrong's letter cite AI productivity for cuts. vs The rollouts that actually work (IBM Bob at 80,000, NVIDIA Codex at 10,000, Cloudflare at 60 percent) are framed as augmentation, and Anthropic and OpenAI imported Palantir's FDE model because the model alone does not deploy itself.. Boards are being asked to price two incompatible futures inside one Q3 plan. Pricing autonomy while staffing for augmentation breaks the headcount math and the harness contract simultaneously. Evidence: ai-replacing-engineers-myth-three-numbers, harness-second-pilot-wall-anthropic-openai-bet, harness-question-became-headcount-question, harness-productivity-number-board-coinbase-letter.
  • Open standards versus orchestration lock-in: Open primitives travel: rules, skills, MCP servers, and subagents are portable across 30-plus tools on agentskills.io, with GitHub's cross-vendor skills spec landing alongside Opus 4.7. vs Orchestration locks in: Microsoft Agent 365, Cursor SDK, and Zed for Business hold vendor-specific orchestration, verifier surface, and identity inventory where the renewal cost actually sits now.. Procurement teams negotiating the seat line are arguing about the layer that just got commoditized while signing away the layer that did not. The lock-in migrated up a tier and the contract template did not. Evidence: harness-portability-audit-before-june-renewal, three-harness-moves-q2-renewal-conversation, four-questions-before-lock-in.
  • Adoption metrics versus productivity reality: Dashboards look healthy: 84 percent of developers use AI tools, 64 percent report a 25 percent velocity boost, and 140,000 organizations are on GitHub Copilot. vs Delivery slides underneath: senior engineers finish tasks 19 percent slower (METR), peer collaboration drops 80 percent (MIT Sloan), 38 percent of the week is debugging AI output (Lightrun), only 5 percent of pilots reach production (RSAC 2026), and trust sits at 29 percent (Stack Overflow).. Adoption is the easiest number to move and the worst one to trust. Every CTO slide right now reports the license-utilization metric while the delivery metric slides underneath, and the gap is where Q3 budgets get cut on bad evidence. This is the 55-point gap. Evidence: harness-trust-gap-adoption, senior-engineer-adoption-myth, ai-coding-adoption-percentage-cto-slide-not-productivity-number, harness-security-register-board-memo-missing-this-week.
Editorial 3D render: three translucent horizontal glass planes stacked in midair, the middle plane edged in soft gold light, with slim gold filaments connecting nodes across the planes. Represents the load-bearing orchestration layer in the modern coding-agent stack.

Sprint moves

  • Name an owner for the morning agent PR queue: Background agents from Claude Managed Agents, Cursor Cloud, and Devin are shipping PRs overnight, but the 8am triage queue has no owner on most org charts. Without a named role tying context curation, verification design, and audit ownership together, the IBM-style 45 percent lift flattens around month four. This Monday: Open a single org-chart box between senior IC and EM, write a one-page role charter naming context curation, verification design, and blast-radius ownership, and assign it by name in next Monday's staff meeting. morning-review-queue-org-chart
  • Audit harness permissions and ship an MCP allowlist this sprint: Vercel was breached on April 19 through one over-permissioned OAuth grant on an AI coding app, four CLIs were hit by the same one-keypress RCE through a shared folder-trust convention in May, and the Claude Code hooks and MCP consent bypass (CVE-2025-59536) scored 8.7 CVSS. Every harness installed this year sits on the same door. This Monday: Ship a CI gate that fails any merge where a harness config grants a new OAuth scope or non-allowlisted MCP server, plus a written permission inventory signed by the CISO before the next sprint review. vercel-breach-coding-agents-oauth-door
  • Replace adoption percent with monthly active engineers on harness: The Jellyfish survey shows 64 percent of engineers believe AI is delivering a 25 percent boost and the same 64 percent say they cannot diagnose productivity with current data. Microsoft just made monthly active engineer rate a board-credible, vendor-auditable number, and Coinbase cited AI productivity in a 14 percent layoff. A sharp board director will ask for a CFO-verifiable source. This Monday: Replace the adoption tile on the engineering KPI dashboard with monthly active engineers on harness plus its month-over-month delta, sourced from SSO and Git telemetry, and require CFO sign-off on the data source before the June board pack. ai-coding-adoption-percentage-cto-slide-not-productivity-number

What to watch

  • Microsoft Agent 365 becomes the default CISO procurement requirement for governing rival harnesses. (within 90 days): Agent 365 at $15 per user already syncs with AWS Bedrock and Google Gemini registries. If Defender ships a one-click block for Claude Code and Cursor sessions, the governance plane locks in before harness vendors finish building their own. Trigger: Microsoft announces named Defender policy templates for Anthropic Claude Code and Cursor Cloud Agents in the Agent 365 admin center, or a Fortune 500 CISO references Agent 365 as the kill switch on stage.
  • Q3 harness renewals split the contract into three line items: platform, metered verifier, and governance. (Q3 2026): Anthropic moved Claude Code toward the $100 Max tier, retired the 1M context beta into standard pricing on 4.6, and three top-tier harness vendors shipped verifier-metered and read-only auto-approve changes in 36 hours on May 12 and 13. The seat-only contract is already a back-formation. Trigger: Anthropic, Cursor, or GitHub publishes a public enterprise SKU that explicitly separates platform seats from a per-merged-PR or per-verified-action verifier meter, with a governance add-on priced independently.
  • A named coding-agent vendor gets absorbed by a hyperscaler at a platform multiple. (H2 2026): SpaceX's $60B option on Cursor, Sierra's 100x ARR mark at $15B, and Cognition at $25B priced tier-one harness outcomes at platform multiples. The vendor stack will not stay independent at those numbers, and Microsoft, AWS, Google, Salesforce, and Atlassian need the orchestration plane. Trigger: An announced acquisition or controlling investment into Cursor, Cognition, Zed, or Sourcegraph by Microsoft, AWS, Google, Salesforce, or Atlassian, or SpaceX exercising the Cursor option.
  • A convention-level harness CVE takes down a Fortune 500 production deployment. (within 90 days): TrustFall hit four CLIs through one shared convention, ClaudeBleed exposed trust-boundary issues, the Vercel and Context.ai pivot showed OAuth scope as the breach vector, and Microsoft disclosed two RCE CVEs in its own framework, all inside 30 days. The shape is patched per-vendor, not per-convention. Trigger: A public disclosure naming a Fortune 500 production breach via a coding-agent harness (Claude Code, Cursor, Codex CLI, Copilot CLI) or an emergency coordinated patch across three or more harness vendors for a shared convention.