What is your June review actually grading now that the typing isn't the job?

Q1 2026 layoffs hit 81,747 while Big Tech committed $725B to AI capex. Mid-year reviews start in about six weeks, and most engineering rubrics are still grading the work that harnesses now do. Here is what to score instead.
Q1 2026 closed with 81,747 tech layoffs against $725B in announced AI capex from Google, Amazon, Microsoft, and Meta. Mid-year reviews start in about six weeks. The 2023-vintage senior engineer rubric is grading work the harness now does, and the engineers actually moving the ladder forward are doing different work that the rubric does not see.
The setup
Last week, somebody finally tabulated Q1. Eighty-one thousand seven hundred forty-seven tech workers lost their jobs in the first quarter of 2026, and Google, Amazon, Microsoft, and Meta confirmed they are spending seven hundred twenty-five billion dollars on AI capex this year, up seventy-seven percent from last year’s record.
Read that again. Layoffs at a two-year high. Capex up by an amount roughly equal to the GDP of Switzerland. Both numbers came out within forty-eight hours of each other, and the people writing the performance review template have not finished their coffee yet.
Mid-year reviews start in about six weeks at most companies. So this is the question I have been getting from CTOs all week: what am I actually grading my senior engineers on now? The rubric I wrote in 2023 is grading work that, in many parts of the org, the harness now does.
What teams tried first
The early instinct was simple. Track AI usage. JPMorgan Chase rolled this out across roughly sixty-five thousand engineers in late March, with a dashboard that classifies each engineer as a light, heavy, or non-user of GitHub Copilot, and ties that classification into the performance review. Several other large enterprises copied the playbook within weeks.
Here is the problem. Counting how often somebody opens the harness is exactly as useful as counting how often somebody opens their IDE. It is hygiene, not output. And measuring hygiene is what every “AI-native” rubric I have seen this spring has actually done. The dashboard turns green, the box gets checked, and the engineer who spent the quarter babysitting an agent that quietly wrote code with a higher defect rate gets the same score as the one who built the harness that made the agent reliable for an entire team.
The Pragmatic Engineer’s survey of nearly a thousand engineers in early 2026 caught the bigger signal underneath the dashboards. Staff and principal engineers are now the heaviest agent users in their orgs, at sixty-three and a half percent regular adoption, more than line engineers, more than managers, more than directors. The people closest to the senior IC ladder are the ones using these tools the most, which is the opposite of what almost every adoption deck predicted in 2024. The ladder is being climbed by people doing different work than what the ladder was designed to evaluate.
Where the rubric breaks
The breakdown is mechanical. Three things have happened.
One. The work that used to anchor the senior IC ladder, which is design plus implementation plus mentorship of more junior implementation, has been compressed. Cloudflare’s engineering blog from late April, the post on their internal AI engineering stack, gives the cleanest snapshot. Ninety-three percent of their R&D organization is now using AI coding tools. Their rolling four-week merge-request average climbed from about fifty-six hundred per week to over eighty-seven hundred, with a single week in March hitting nearly eleven thousand. The same headcount is producing roughly fifty-five percent more merge requests than it was a quarter earlier.
Two. The bottleneck moved. The Cloudflare team, in a separate post on their AI code review system, ran the math on what scaled to absorb that volume. They built an orchestrated set of up to seven specialized review agents per merge request. In the first thirty days the system handled one hundred thirty-one thousand review runs across forty-eight thousand merge requests, with a median review time of three minutes thirty-nine seconds. Verification, not implementation, is now the throughput governor. Maxime Najim, a staff engineer at Target, put it cleanly in a piece for LeadDev in March: execution is no longer the scarce resource, verification becomes the primary human bottleneck.
Three. The thing that separates good from great changed. Najim’s framing again, because it is the cleanest version I have seen. When agents can grind through any implementation over a weekend, what separates a good engineer from a great one is, increasingly, one thing: how good the spec is. The senior IC who can write a precise specification, with the right constraints and the right interfaces and the right escape hatches, gets a working system on Monday morning. The senior IC who writes a vague spec gets back a confidently wrong one.
"81,747 tech workers lost their jobs in Q1 2026, the highest quarterly layoff figure the industry has seen in at least two years."
When I look at a 2023-vintage rubric for senior engineer that scores “scope of code shipped” and “mentorship of junior engineers in implementation” and “individual technical depth,” I see three line items that have all been quietly hollowed out by the harness. Not eliminated. Hollowed.
The pattern: what the new ladder actually rewards
Federal News Network ran a sponsored piece on May 5 that introduced the term I have been waiting to see in mainstream technical press. Harness engineering. The discipline of orchestrating how agents interact with models, allocate tools, run feedback loops, and obey hard-coded guardrails. It sits exactly where the old senior IC work used to: between architecture and implementation, with strong opinions about both.
Here is what the new ladder rewards across the senior IC band, drawn from the in-window evidence and from what teams are actually shipping.
The new senior IC ladder rewards three specific things: the quality of the spec, the design of verification, and the durability of the harness that makes agents reliable for a team rather than a single user.
Spec quality. Can this engineer write a specification clear enough that an agent produces a correct implementation on the first or second pass, with the right test contract, the right interfaces, and a kill switch when the assumptions break? This is the skill that used to be called “good architecture document” and is now the difference between a productive harness and an expensive one.
Verification design. Can this engineer build an eval, a review harness, a guardrail policy, or an automated check that lets the team trust output it did not personally write? Cloudflare’s seven-reviewer pattern is the loud version. The quieter version is the staff engineer who writes the AGENTS.md file that turns a chatty agent into a focused one for the next twelve weeks.
Harness durability. The piece in Security Boulevard on May 5, a practitioner playbook for Claude Code, made the obvious-but-overlooked point. Most engineers using a harness are using maybe ten percent of what it can do. They open the terminal, type the request, accept the result, and call it productive. The teams getting real returns are the ones where one or two senior ICs invested in turning the harness into a system. CLAUDE.md files that anchor every session, skills that encode conventions, subagents that handle review. That investment compounds for everybody else on the pod.
What I’d tell you over coffee
Three concrete moves before this June’s reviews.
First, ask each senior IC for their best spec from the last quarter. Not their best PR. Not their best Jira ticket. Their best spec, in the form they actually shipped it to a harness or to a less senior engineer who used a harness. You will see, immediately, who is operating at the new ladder and who is still typing.
Second, ask each senior IC to walk through one piece of verification infrastructure they designed in the last ninety days. An eval, a review subagent, a CI gate, a guardrail. If the answer is “I just review the diffs in my pod manually,” that is information. Sometimes good information, depending on the pod. Often not.
Third, drop “lines of code shipped” from the rubric entirely, and add “merge throughput of the team this engineer’s harness work supports.” One staff IC who builds the harness for a five-person pod is now producing more leverage than three staff ICs each shipping their own PRs in parallel. If the comp bands do not reflect that, the best people are going to figure it out by August, and one of them is already talking to a recruiter.
The reason I think this is more urgent than the usual ladder-update conversation is the macro pressure. Marc Benioff said the quiet part out loud last week. Sam Altman pushed back, fairly, about how much of the layoff wave is genuinely AI versus AI-flavored cost-cutting that was coming anyway. The honest answer is some of both. But the boards funding seven hundred twenty-five billion dollars of AI capex are going to look at engineering performance differently this June than they did last June. The engineers who notice this first, and who get scored on the work that actually matters now, will be the ones still here at year end.
The ladder is rewriting itself with or without the rubric. June is when the team finds out which version the reviewers are using.
Sources
- Big Tech layoffs 2026: Amazon, Meta, Microsoft and the AI trade-off - Capitaxer / Invezz, 2026-05-04
- Harness engineering emerges as agentic AI matures - Federal News Network, 2026-05-05
- Claude Code for Engineers: A Practitioner's Playbook for Software, QA, and Security Teams - Security Boulevard, 2026-05-05
- How AI is changing my work as a staff+ engineer - LeadDev, 2026-03-16
- Orchestrating AI Code Review at scale - Cloudflare blog, 2026-04-20
- The AI engineering stack we built internally - Cloudflare blog, 2026-04-20
- AI Tooling for Software Engineers in 2026 - The Pragmatic Engineer, 2026-03-04