How to evaluate a coding-agent harness when parallel agents are the new bar

2026-05-02

Parallel agents went from differentiator to substrate in four days. Here is the six-step evaluation a CTO can run this week before the next renewal conversation.

TLDR In the four days from April 29 to May 2, parallel agents stopped being a feature and started being the substrate. Zed shipped 1.0 with parallel agents as the headline, Cursor turned its harness into a programmable SDK, GitHub put cloud agents inside Visual Studio, and Anthropic shipped a security agent in public beta. If a harness evaluation rubric still leads with model quality, it is already a quarter behind. Here is the six-step rubric I would run this week. The problem this solves I was on a call with a VP Eng yesterday who said the quiet thing out loud. “We finished the Cursor versus Claude Code bakeoff in March. We just realized none of our criteria still apply.” That was an honest sentence and a slightly painful one. In a single window, between April 29 and May 2, Zed shipped 1.0 with parallel agents as a headline 1.0 capability, Anysphere shipped the Cursor SDK so internal platform teams can fan out coding agents from any TypeScript program, GitHub published an April update that pulls cloud agents into Visual Studio, and Anthropic put Claude Security into public beta. None of these are toys. All of them change what “evaluating a harness” means. The old rubric was about generation quality. The new rubric is about how the harness behaves when six agents are touching the same codebase at the same time, who verifies their work, and who can shut them off without losing the editor. The approach The fastest way I have found to update an evaluation rubric is to make the new criteria explicit and rate every harness on the team’s shortlist against the same six items. Use the steps below as a working draft. Aim to finish the rubric by Friday and run it across two harnesses next week. Score concurrency and isolation, not raw speed Ask each harness one question. When two agents touch the same repo at the same time, what isolates them? Acceptable answers in May 2026 are git worktrees (Cursor, Claude Code subagents) or sandboxed cloud VMs (Cursor SDK, Codex cloud). An

Back to all insights