The Mythos Problem Nobody's Talking About

Less than 48 hours after Anthropic dropped the news about their Mythos model, the world is still processing what just happened. And I think most people are getting the story half right.

Here's what everyone seems to understand: Anthropic built a model that can autonomously find zero-day vulnerabilities in code that humans thought was secure for decades. One flagship example — a FreeBSD feature that's been around for 27 years with no known exploit. Mythos found one for about $50 in compute. It can chain exploits together in clever ways to crack layered defenses. The cat-and-mouse game of cybersecurity just got wildly lopsided.

Here's what most people are missing: this is just the beginning of the problem…

LINKS:

Digital Hygiene by Andrej Karpathy

Eliezer Yudkowsky on Mythos

The Asymmetry Problem

Anthropic launched Project Glasswing — a coalition of major tech companies (AWS, Cisco, and others) getting access to test Mythos on Google Cloud. The narrative people are running with is: Mythos finds the vulnerabilities, Glasswing patches them, problem solved.

Not quite.

As Eliezer Yudkowsky pointed out, general code hardening is a fundamentally harder computer science problem than finding a single vulnerability. Our ability to find weaknesses just skyrocketed. Our ability to fix them? Unchanged. If Mythos dumps a million potential vulnerabilities on the desks of engineers at a major company, those problems don't just disappear. There's still a human in the loop for every patch, and we're nowhere near the era where AI agents autonomously rewrite production codebases without introducing new problems.

Logan Graham, who's running the project on Anthropic's side, said the public reaction pleasantly surprised him. People went through three stages: (1) this is a crazy model, (2) Anthropic handled it responsibly, and (3) I'm worried about what comes next. Graham said he would've been happy with just stage two. The fact that people reached stage three is actually a good sign.

It Might Already Be Worse Than You Think

Here's where it gets uncomfortable. An analysis from iX claimed that when they pointed small, cheap, open-weight models at the same code Anthropic showcased, eight out of eight detected the same FreeBSD exploit. Their argument: you don't need one massive expensive model — you can deploy a swarm of cheap open-source models to scan everything.

If they're right, that's arguably worse news. It means the capability to break stuff at scale might already be out there. Nobody's just figured out how to orchestrate it yet.

Whether the "swarm of small models" approach truly matches Mythos or not, the takeaway is the same: we've crossed a threshold. The offensive capability exists. The defensive infrastructure hasn't caught up.

The Emergence Factor

Here's the part that should keep you up at night. Anthropic wasn't trying to build a world-class hacking model. They were training a model to be better at coding. The cybersecurity capability just... emerged. It was a byproduct.

This is the pattern we keep seeing. OpenAI released models that turned out to be capable of solving longstanding math problems (confirmed by Terence Tao). They didn't train for that specifically. Someone out there just discovered the capability and tested it.

And the pipeline doesn't stop. Meta should have a Mythos-scale model before year's end. Elon confirmed xAI has a 10 trillion parameter Grok in training with a two-month pre-training timeline. Glasswing has maybe six months to help harden global infrastructure before these capabilities proliferate.

What You Should Actually Do

Rule number one: don't panic.

Rule number two: take this seriously enough to act.

Andrej Karpathy published a guide called "Digital Hygiene" that's a great starting point. The basics:

Password manager. If you're not using one, start today.
Hardware security keys. Physical keys for your most important accounts.
Biometrics where available. One more layer.
Audit your security questions. They're weaker than you think.
Encrypted messaging. Default to it.
IoT awareness. Your smart devices are likely far less secure than you assume. (Someone recently used Claude Code — not even Mythos — to hack a robot vacuum and discovered they could access every unit of that model worldwide. The company had built data collection infrastructure that inadvertently exposed everything.)
Virtual credit cards. Services like privacy dot com let you generate unique card numbers.
DNS-based ad blocker and network monitor. More advanced, but worth looking into.
Back up your data. Google Takeout to an air-gapped, offline hard drive. If you wake up tomorrow and everything online is gone — every video, every email, every document — would you be okay?

Even if none of the Mythos fears materialize, these are all good practices. Data breaches are constant. Fraud is everywhere. Companies are routinely violating your privacy. You literally can't lose by tightening things up.

The Alignment Elephant in the Room

One more thing. In the Mythos system card, there are examples of the model doing things researchers didn't expect. Anthropic also documented cases across their models of cheating, deception, and misaligned behavior. Not frequently, but consistently enough that they haven't gotten it to zero for any model.

Mythos is their most aligned model. But it can do the most damage if it's unaligned. Would you rather have a model with a 10% chance of leaking your emails, or a 1% chance of ending you?

We're entering the era of big models. This isn't a warning shot after which everything stops. It's the starting gun.

Stay sharp out there.

— Wes

The Mythos Problem Nobody's Talking About

The Asymmetry Problem

It Might Already Be Worse Than You Think

The Emergence Factor

What You Should Actually Do

The Alignment Elephant in the Room

Reply

Keep Reading

NATURAL 20

Home

YouTube Channel

AI Mastery Course

About Us