---
title: "AI-powered code review tools got faster this week. The bottleneck didn't move."
slug: harness-ai-code-review-tools-bottleneck
date: 2026-06-11
excerpt: This week Cursor, GitHub, and CodeRabbit all upgraded their AI code review tools for speed, cost, and polish. None of those were the part of the review bottleneck that protected production, and here is what an engineering leader should actually do about it.
featured_image: "https://bbtxujdxvidaghmhxkqs.supabase.co/storage/v1/object/public/generated-images/blog-1781168347140-harness-ai-code-review-tools-bottleneck.webp"
featured_image_alt: A single lit desk in a quiet engineering office at night, a laptop open to a pull request with a green review checkmark and one empty chair pulled up to it, suggesting the human who still makes the final merge decision.
canonical_url: https://cerevisor.com/blog/harness-ai-code-review-tools-bottleneck
updated_at: 2026-06-11T08:59:08.268097+00:00
---

# AI-powered code review tools got faster this week. The bottleneck didn't move.

TLDR

This week Cursor, GitHub, and CodeRabbit all shipped upgrades to their AI code review tools, and every one of them targeted speed, cost, or polish. Those were never the part of the review bottleneck that protected production. The part that does, a human deciding the change is correct, is exactly what none of these tools remove. Buy the speed, keep the judgment.

I counted three different [AI code review tools](/blog/harness-ai-code-review-who-owns-the-merge) shipping upgrades inside about 48 hours this week. [Cursor](/blog/harness-supervisory-engineer-org-chart-box) made Bugbot faster and cheaper. GitHub slipped a [security](/blog/permissions-security-lock-down) review command into its Copilot CLI. CodeRabbit published a long piece on why its reviews are explainable. None of it was a new model. All of it was a new reviewer.

That timing tells me something. The whole category has quietly agreed on where the pain is. When an agent writes the code, the slow step moves to the part where a human has to decide the code is actually fine. So the tools are racing to own the reviewing seat. The question I keep coming back to, the one a VP of Engineering has to answer before signing a purchase order, is simpler than the marketing: when the agent already wrote the code, does adding a second agent to review it clear the queue, or just stack another opinion on the pile?

---

## What an AI-powered code review actually shipped this week

Start with the receipts, because they are genuinely good.

Cursor updated Bugbot on June 10 and did not bury the numbers.

> "Bugbot is now over 3x faster to run, 22% cheaper, and finds 10% more bugs per review. In practice, 90% of Bugbot runs now finish in under three minutes."

Cursor, June 2026

Faster, cheaper, catches a little more. The same day, GitHub shipped a dedicated security review command into its Copilot CLI. Run it on local changes before a commit and it returns high-confidence findings scored by severity and confidence, with one-keystroke fixes in the terminal. GitHub was careful to note it “doesn’t rely on GitHub code scanning, Dependabot, or GitHub secret scanning.” It is a separate, on-demand pass that lives where the developer already works, a quiet expansion of what GitHub Copilot code review covers.

A day earlier, CodeRabbit went a different direction and argued the differentiator is explainability, a review that reads a pull request the way the author would explain it. Their framing leaned on a number from Salesforce Engineering: “code volume up roughly 30%, with pull requests regularly exceeding 1,000 lines.” When a PR is over a thousand lines, a review that explains itself is worth more than one that just flags.

Notice what every one of these improvements is about. Speed. Cost. Coverage. Polish. These are throughput upgrades to the reviewing step. They are real, and if a team already runs one of these [AI code review tools](/blog/harness-ai-code-review-who-owns-the-merge), the upgrades are free wins. But throughput was never the expensive problem.

---

## Every AI code review tool wins its own benchmark

Here is the part the launch posts skip.

Every vendor benchmark in this category has the same tell: the vendor wins. Earlier this year DeepSource pulled this apart. Greptile reported 82% recall, a 24-point lead over the field. Then Augment Code ran an evaluation on the exact same five repositories and Greptile scored 45%.

Greptile's bug-catch rate, depending on who ran the test

Who ran the benchmarkGreptile's score

Greptile (its own eval, 5 repos)**82% recall**
Augment Code (same 5 repos)45%

Same code. Same repos. Roughly double the score depending on who held the clipboard. This is not a knock on any one tool; it is the physics of marking your own homework. A Factory.ai benchmark in April found GPT-5.5, used as a reviewer, commented at about the right rate but landed under 48% precision, meaning close to half its comments were false positives. False positives are the quiet killer here. One independent comparison put Greptile at 11 false positives per run against CodeRabbit’s 2. The first time an engineer gets three useless comments in a row, the bot loses them. After that the reviews are just noise to scroll past.

So when a vendor says its tool catches the most bugs, the only honest follow-up is: caught on whose test?

There is a subtler trap in how these tools get judged. A LogRocket power ranking published June 8 noted that in blind reviews, one assistant’s output was “preferred 67% of the time vs Codex’s 25%.” Preference is the metric everyone reaches for because it is easy to collect. It is also the wrong one for a reviewer.

Key Insight

A blind-preference score measures whether a human likes the reviewer's comments. It does not measure whether the code is correct. Those are different numbers, and only one of them keeps a bug out of production.

---

## What the best AI code review tool still can’t decide

Now the uncomfortable data.

The reviewer tools are getting faster, but the review stage is getting slower, because the volume went vertical. Faros AI’s analysis this spring found that on high-AI-adoption teams, pull requests spend about 91% longer in review, and the share of PRs merged with no review at all climbed 31%. CircleCI’s 2026 delivery report showed the same shape from a different angle: feature-branch throughput up 59% year over year, while the code actually reaching production fell 7%.

+91%

longer time in review for high-AI-adoption teams, even as code volume climbs (Faros AI, spring 2026)

More code, more reviews, slower delivery. That is a bottleneck that moved, not one that closed. And it points at the thing none of this week’s launches touch. Cursor’s Bugbot finds issues. GitHub’s command returns findings. CodeRabbit explains a diff. Not one of them merges the code. The merge decision, the moment a person says this change is correct and I will put my name on it, is still a human’s. The New Stack asked the honest version of this on June 9, with a headline I have not stopped thinking about: “How long before we stop reading the code?”

> Speed and cost were never the part of the review bottleneck that protected production. The judgment was.

The reviewers make the queue move faster. Whether the queue moves safer is a different question, and no AI-powered code review platform can answer it for one specific team. A tool can raise the floor on the boring checks. It cannot raise the ceiling on whether the change was the right change to ship.

---

## What I’d tell you over coffee

If a VP of Engineering asked me this week whether to buy one of these, I would say yes, probably, and then I would say the buying is the easy 10%. Three things matter more.

First, decide before rollout what the reviewer is allowed to resolve on its own and what it must hand to a person. An AI reviewer with no boundary becomes either a rubber stamp or a noise machine. Draw the line on purpose.

Second, measure its false-positive rate against your own historical pull requests before [trust](/blog/ai-adoption-trust-not-training)ing it in the flow. Run it on a few hundred already-merged PRs and see how often it cries wolf. If engineers would have muted it, it is not ready. A good rule of thumb is to tune until the noise is low enough that nobody reaches for the mute button, and only then put it in the path of real work.

Third, keep one named human owning the merge, and keep a real metric on verified merged output rather than review speed. Speed is the number the tools will improve for free. Judgment is the number that protected production the whole time, and it is still on the team’s side of the table.

The calm version of all this: the AI code review tools got better this week, genuinely. They take the boring weight off the reviewing step, and on a thousand-line PR that is a gift. They just cannot take the responsibility off the desk. That part was always the job, and it is the part worth keeping.

#### Sources

- [Bugbot is now over 3x faster, 22% cheaper, and finds 10% more bugs](https://cursor.com/blog/bugbot-updates-june-2026) - Cursor, 2026-06-10

- [Dedicated security review command now available in Copilot CLI](https://github.blog/changelog/2026-06-10-dedicated-security-review-command-now-available-in-copilot-cli/) - GitHub Changelog, 2026-06-10

- [Explainable AI Code Review: How CodeRabbit Review Works](https://www.coderabbit.ai/blog/coderabbit-review-reads-a-pr-how-author-would-explain-it) - CodeRabbit, 2026-06-09

- [How long before we stop reading the code?](https://thenewstack.io/future-of-code-reviews/) - The New Stack, 2026-06-09

- [AI dev tool power rankings & comparison](https://blog.logrocket.com/ai-dev-tool-power-rankings/) - LogRocket, 2026-06-08

- [Every AI code review vendor benchmarks itself, and wins](https://deepsource.com/blog/ai-code-review-benchmarks) - DeepSource, 2026-02-26

- [The AI Engineering Report 2026: The AI Acceleration Whiplash](https://www.faros.ai/blog/ai-acceleration-whiplash-takeaways) - Faros AI, 2026-04-22

- [Which Model Reviews Code Best?](https://factory.ai/news/code-review-benchmark) - Factory.ai, 2026-04-29

- [2026 State of Software Delivery](https://circleci.com/resources/2026-state-of-software-delivery/) - CircleCI, 2026-04-02
