- NATURAL 20
- Posts
- The Scariest Chart in AI. METR AI Agents Capability Accelerating
The Scariest Chart in AI. METR AI Agents Capability Accelerating
Something happened last week that I think most people missed.
METR (a nonprofit that tracks how capable AI agents are getting) updated their benchmark chart.
And when Claude Opus 4.6 landed on it... the curve broke.
WATCH MY VIDEO HERE:
Let me explain what this chart actually shows, because almost everyone gets it wrong.
METR assembled hundreds of real-world tasks; coding, cybersecurity, machine learning engineering.
Then tasked human experts to complete them.
Not interns. Not generalists.
Domain experts sitting at a computer, doing the work they do every day.
Then they measured: how long did it take the human?
That's the Y-axis.
Not how long the AI took. How much human labor the AI replaced.
When Claude Opus 4.5 hit the chart, people started panicking. It could reliably complete tasks that took human experts over 5 hours. A full half-day of expert work, done by an AI agent.
Then Opus 4.6 dropped.
14.5 hours.
That's not a typo. Nearly two full work days of expert-level labor that’s replaced in a single AI session.
And the confidence interval stretches up to 98 hours.
That's more than two full work weeks.
But here's what makes this truly alarming:
The doubling time (how long it takes for AI agent capability to double) is currently around 4 months.
Some estimates put it at 123 days.
That means by mid-2026, we could be looking at AI agents that can handle tasks taking humans weeks to complete.
METR's own forecasting model projects that at current trajectory, AI could automate 99% of AI R&D tasks by 2032.
That's not science fiction.
That's their median estimate based on the data.
Not everyone’s onboard however…
MIT Technology Review published a piece calling the chart "misleading." And they raise some fair points:
• The sample sizes are small (only a few hundred tasks)
• The human baselines were set with older benchmarks
• Expert familiarity with their own tasks creates an asymmetry
These are real limitations. I break all of them down in the video.
But here's what the critics miss: even if you discount the numbers by half... even if the real time horizon is 7 hours instead of 14.5... the slope hasn't changed. The capability is still doubling every few months. The curve is still going up and to the right.
And the practical evidence matches. Companies are already restructuring around AI agents. Software stocks have lost over a trillion dollars in weeks. IBM crashed 13% today because Anthropic announced Claude can modernize COBOL.
This isn't theoretical anymore. The chart is just catching up to what's already happening.
I made a full breakdown, what the chart actually measures, why the critics are both right and wrong, and what the trajectory means for the next 12-24 months.
📺 Watch here: https://youtu.be/yuW0939jtco
This is the most important chart in AI right now. Understand it, and you understand everything that's coming.
Wes “It’s Getting Scary Now” Roth
Reply