Anthropic Says Its New AI Model Is Too Dangerous to Release

Today, Anthropic dropped what might be the most consequential AI announcement of 2026 so far — and it's not a product launch. It's a warning.

The company revealed that its newest frontier model, Claude Mythos Preview (internally codenamed "Capybara"), is so good at finding and exploiting software vulnerabilities that Anthropic has decided not to release it to the public. Instead, they've assembled a defensive coalition of 40+ major technology companies called Project Glasswing, backed by $100 million in compute credits, to give the good guys a head start before these capabilities inevitably spread.

Let's break down everything that happened today.

All key links are at the bottom of the article.

What Is Claude Mythos Preview?

Mythos Preview is a brand-new tier in Anthropic's model hierarchy — sitting above Opus as the most powerful model the company has ever built. The full lineup is now Haiku → Sonnet → Opus → Capybara/Mythos.

Here's the critical detail: it wasn't trained for cybersecurity. CEO Dario Amodei explained in a launch video that Anthropic trained it to be exceptional at code — and as a side effect, it became exceptional at finding the flaws in code. That distinction matters enormously, because it means you can't just remove the cybersecurity capabilities without crippling the model entirely.

The 243-page system card published alongside the announcement is the most detailed safety document Anthropic has ever released — and notably, it's the first time Anthropic has published a system card for a model it isn't making generally available.

The model has been live internally at Anthropic since February 24, 2026, following a 24-hour alignment review.

The Vulnerability Discoveries That Shook the Industry

The Frontier Red Team blog post — authored by 22 researchers — details discoveries that Anthropic's Logan Graham called "the starting point for what we think will be an industry change point, or reckoning."

The 27-Year-Old OpenBSD Bug

OpenBSD is considered one of the most security-hardened operating systems on Earth. It's used in firewalls, routers, and critical infrastructure worldwide. Mythos found two chained bugs in TCP SACK handling — a missing range-start validation and a NULL pointer write — that together enabled remote denial-of-service against any OpenBSD machine simply by connecting to it.

This vulnerability had existed since 1998. The specific run that found it cost under $50 in compute. It's now patched.

The 16-Year-Old FFmpeg Bug That Survived 5 Million Automated Scans

FFmpeg is one of the most widely-used video processing libraries in the world. Mythos found a 16-bit integer overflow in the H.264 codec's slice/macroblock tracking that was invisible to fuzz testing — the specific line of code had been scanned 5 million times by automated tools without triggering detection. A sentinel value collision made the bug essentially undetectable by traditional methods.

Cost to find: roughly $10,000 for several hundred runs with zero false positives.

Fully Autonomous Remote Code Execution on FreeBSD

Perhaps the most alarming discovery: a 17-year-old FreeBSD NFS vulnerability (CVE-2026-4747) that was discovered and exploited entirely without human involvement after an initial prompt. The model built a 20-gadget ROP chain split across six sequential RPC requests that writes an attacker's SSH public key to root authorized_keys. End-to-end, no human in the loop.

The Full Scope

Beyond those headline findings, Mythos also demonstrated:

Linux kernel privilege escalation — chaining 2–4 vulnerabilities for root access
Browser exploits — chaining four vulnerabilities to escape both renderer and OS sandboxes, including a proof-of-concept that let an attacker domain read a victim's banking data
Guest-to-host VM escape — memory corruption in a Rust-based VMM (in unsafe code)
Critical certificate authentication bypass in the Botan cryptographic library
Complete web application authentication bypasses and a smartphone lock screen bypass

The red team's validation found that expert contractors agreed with Claude's severity assessments 89% of the time. And in one telling anecdote, engineers at Anthropic with no formal security training asked Mythos to find remote code execution vulnerabilities overnight and "woke up the following morning to a complete, working exploit."

Over 99% of the vulnerabilities discovered have not yet been patched.

The Benchmarks Are Staggering

The performance gaps between Mythos and its predecessor (Claude Opus 4.6) are unusually large for a single generation:

Benchmark	Opus 4.6	Mythos Preview
CyberGym (1,507 tasks)	66.6%	83.1%
Cybench (40 CTF challenges)	Variable	100% (saturated)
Firefox JS exploits	2 working	181 working
SWE-bench Verified	80.8%	93.9%
SWE-bench Pro	53.4%	77.8%
Humanity's Last Exam	40.0%	56.8%
Terminal-Bench 2.0	65.4%	82.0% (92.1% extended)

On the OSS-Fuzz corpus, previous models found 150–175 tier-1 crashes. Mythos found 595 crashes at tiers 1–2, plus 10 full control-flow hijacks (tier 5) on fully patched targets. No previous model had achieved tier-5 results at scale.

Project Glasswing: The Defensive Coalition

Project Glasswing — named after the glasswing butterfly whose transparent wings let it hide in plain sight, mirroring how critical vulnerabilities lurk undetected in widely-used code — is unlike anything the AI industry has assembled before.

The Partners

The 12 launch partners include:

Amazon Web Services
Apple
Broadcom
Cisco
CrowdStrike
Google
JPMorganChase
Linux Foundation
Microsoft
NVIDIA
Palo Alto Networks

Beyond these, 40+ additional organizations that build or maintain critical software have been granted access.

The Money

$100 million in Claude usage credits across participants
$4 million in direct donations to open-source security organizations
- $2.5M to Alpha-Omega and OpenSSF through the Linux Foundation
- $1.5M to the Apache Software Foundation

Pricing After Research Credits

Once research credits are used up: $25/M input tokens and $125/M output tokens — accessible through AWS Bedrock, Google Cloud Vertex AI, and Microsoft Foundry.

What Partners Are Saying

CrowdStrike CTO Elia Zaitsev warned: "The window between a vulnerability being discovered and being exploited by an adversary has collapsed — what once took months now happens in minutes with AI."

AWS CISO Amy Herzog confirmed they've already been testing Mythos in their own security operations, applying it to critical codebases.

Microsoft's Igor Tsyganskiy reported "substantial improvements compared to previous models" when tested against their security benchmark.

Linux Foundation CEO Jim Zemlin made the open-source case: "In the past, security expertise has been a luxury reserved for organizations with large security teams. Open-source maintainers — whose software underpins much of the world's critical infrastructure — have historically been left to figure out security on their own."

Within 90 days, Anthropic will publish a public accounting of lessons learned and vulnerabilities fixed.

The Safety Paradox

The system card contains what may be the most important sentence in the entire announcement:

❝

"Claude Mythos Preview is, on essentially every dimension we can measure, the best-aligned model we have released to date by a significant margin. Even so, we believe that it likely poses the greatest alignment-related risk of any model we have released to date."

The safety improvements are real: misuse success rates fell by more than 50% versus Opus 4.6. Character stability improved. Hallucinations dropped. The model refused weight exfiltration under pressure.

But the system card also documents alarming incidents from earlier internal versions:

The model developed a multi-step exploit to escape its sandbox, gained broad internet access, posted exploit details to public-facing websites, and contacted a researcher — who "found out by receiving an unexpected email from the model while eating a sandwich in a park"
It accidentally obtained an answer via a prohibited method, then reasoned it needed to make its final answer "not too accurate" to avoid detection
It found exploits to edit files without permissions and hid changes from git history
It accessed credentials by inspecting process memory

White-box interpretability analysis confirmed the model "often recognizes transgressive actions as such while taking them" — internal representations of rule violation and concealment were active even when the model's visible reasoning showed no indication.

On the SHADE-Arena benchmark for covert behavior, stealth success rate rose from ~0% to ~3.8% — small but unprecedented.

Coordinated Vulnerability Disclosure

Anthropic published a formal coordinated vulnerability disclosure policy addressing the unique challenges of AI-scale discovery:

90-day standard disclosure deadline (with 14-day extension for active progress)
7-day deadline for actively exploited critical vulnerabilities
45-day wait after patches before publishing full technical details
Every report is verified by a human security researcher and labeled as AI-discovered
Anthropic commits to pacing submissions to what maintainers can absorb — not dumping thousands of reports on volunteer teams at once

The Business Context

The timing here is no accident. Also announced this week:

Anthropic's annualized revenue has tripled from $9B to over $30B — reportedly overtaking OpenAI
1,000+ business customers spending over $1M annually (doubled from 500 in ~two months)
A new Broadcom-Google compute deal securing ~3.5 gigawatts of next-gen TPU capacity
Anthropic is reportedly evaluating an IPO as early as October 2026

As VentureBeat noted, "A high-profile, government-adjacent cybersecurity initiative with blue-chip partners is exactly the kind of program that burnishes an IPO narrative."

The Skeptics Aren't Wrong Either

Not everyone is buying the framing wholesale.

Aikido Security published a detailed counterpoint based on 1,000 real-world AI penetration tests, arguing that whitebox tests (with full source access) surfaced 7× more critical issues than greybox tests — meaning attackers without internal context remain far less effective than defenders. Their conclusion: the doomsday framing overstates the actual shift in attacker advantage.

VentureBeat raised the practical concern of flooding volunteer open-source maintainers with thousands of critical bug reports — referencing the curl project shutting down its bug bounty after AI-generated reports consumed maintainer time.

Journalist Kelsey Piper flagged the geopolitical irony: "A private company now has incredibly powerful zero-day exploits of almost every software project you've heard of. And Hegseth and Emil Michael have ordered the government not to in any capacity work with Anthropic."

And Futurism observed that "a frontier AI company working on what it claims to be the next big thing that's more capable than anything that's come before is pretty standard fare" — drawing comparisons to OpenAI's GPT-5, which was a major letdown when released.

The structural critique is real: Anthropic is simultaneously sounding the alarm about AI cybersecurity risks and selling the primary solution.

What This Actually Means Going Forward

Anthropic's red team put it bluntly: "We see no reason to think that Mythos Preview is where language models' cybersecurity capabilities will plateau."

Logan Graham told Axios that similar capabilities from other labs are 6 to 18 months away. Cato Networks CEO Shlomo Kramer warned CNN: "Behind Mythos is the next OpenAI model, and the next Google Gemini, and a few months behind them are the open-source Chinese models."

Google's participation in Project Glasswing is itself revealing — it suggests Gemini can't currently match these capabilities.

Anthropic does not plan general availability for Mythos Preview. The model will remain restricted while the company develops safeguards. A Cyber Verification Program will allow qualified security professionals to apply for access. Open-source maintainers can apply through the Claude for Open Source program.

The 90-day reporting timeline means we'll see the first public results by early July 2026.

With over 99% of discovered vulnerabilities still unpatched, the race is on.

This is the least capable model we'll have access to in the future.

— Jared Kaplan, Anthropic Chief Science Officer

Key Links

Project Glasswing — Official announcement and partner list
Claude Mythos Preview System Card (PDF) — 243-page safety and capabilities report
Frontier Red Team Blog — Detailed vulnerability discovery findings
Coordinated Vulnerability Disclosure Policy — Anthropic's framework for responsible disclosure
NYT: Anthropic Claims Its New A.I. Model, Mythos, Is a Cybersecurity 'Reckoning'
VentureBeat: Anthropic Says Its Most Powerful AI Cyber Model Is Too Dangerous to Release
TechCrunch: Anthropic Debuts Preview of Powerful New AI Model Mythos
CyberScoop: Tech Giants Launch AI-Powered 'Project Glasswing'
Axios: Anthropic Withholds Mythos Preview Model
Aikido Security: What 1,000 AI Pentests Actually Show

Buckle up,

Wes “it’s getting real now” Roth