Three stories dominate the AI cycle this Monday, and they all point at the same thing: the labs are sprinting, the stakes just got national-security-grade, and nobody — including Google — is willing to admit out loud who's winning the coding race. Sergey Brin reportedly wants Gemini engineers "forced" to use internal agents. The NSA is reportedly running Anthropic's most dangerous model to date despite a Pentagon blacklist. And OpenAI's next-generation model, codenamed "Spud," is allegedly already quietly A/B testing inside ChatGPT.

Here's the full breakdown:

Story 1 — Google assembles a coding "strike team" to catch Anthropic

The Information's Erin Woo reported on April 20, 2026 that Google DeepMind has spun up a dedicated "strike team" of researchers and engineers to improve its AI coding models — a task force that, according to "three people with direct knowledge of the situation," was formed in direct response to Anthropic's recent Claude releases and the internal perception that Google is losing the coding race.

The lede of Woo's piece reads: "Google has assembled a strike team of researchers and engineers to improve its AI coding models… as it looks to automate more of its own coding and ultimately its AI research."

The team is led by Sebastian Borgeaud, the Google DeepMind research engineer who previously ran pretraining for Gemini (and co-authored RETRO and Chinchilla). Two of the most powerful people in Alphabet are reportedly hovering directly over the effort: co-founder Sergey Brin and DeepMind CTO Koray Kavukcuoglu. The strike team's mandate, per the reporting, is long-horizon agentic coding — understanding entire codebases, reading files, inferring user intent, and writing complete software from scratch.

"Urgently bridge the gap in agentic execution"

The most striking material in Woo's report is a memo Brin sent to DeepMind staff. He writes that to "win the final sprint," Google must "urgently bridge the gap in agentic execution and turn our models into primary developers [of code]." He tells the team they must "aggressively pivot" to catch up on agents, and he reportedly framed the end goal in the language of "AI takeoff" — a model good enough at coding that it can, eventually, improve itself.

Brin's memo goes further: every Gemini engineer, he wrote, should be required to use Google's internal agents for complex, multi-step tasks. Outside of DeepMind proper, other teams inside Google are reportedly holding mandatory AI training sessions for engineers.

The Jetski leaderboard

A notable internal detail: Google maintains an internal leaderboard that tracks engineer usage of "Jetski," its internal agentic coding IDE. Jetski is essentially the internal sibling of Google Antigravity — the public product launched alongside Gemini 3 Pro in November 2025. Both tools were built by the team Google absorbed from Windsurf in its reported $2.4B 2025 IP deal. Per developer posts on X (Gergely Orosz, November 2025), Google engineers are disallowed from using the public Antigravity for work and instead use Jetski, which supports Google's internal monorepo (google3), internal docs search, and integrates with Cider, Google's internal VS Code fork.

In short: Google built its elite coding agent inside a walled garden, and now management is literally tracking who's using it.

Why the panic — the numbers that made it awkward

The context that made this memo land so hard inside DeepMind is two public admissions earlier this year. In January, Boris Cherny, who runs Claude Code at Anthropic, told an interviewer that "pretty much 100%" of Anthropic's own code is now written using AI. One month later, on Google's February 2026 earnings call, Alphabet CFO Anat Ashkenazi said that about 50% of code at Google is written by coding agents — a number that sounded impressive until you put it next to Anthropic's.

The perception inside DeepMind, per Woo's sources, is that Anthropic's coding tools are simply ahead of Gemini on code-writing ability. That perception is consistent with the third-party leaderboard picture in April 2026:

  • SWE-bench Verified: Claude Opus 4.7 (released April 16) ~87.6%, GPT-5.3-Codex ~85.0%, Gemini 3.1 Pro ~80.6%

  • SWE-bench Pro: Claude Opus 4.7 ~64.3%, GPT-5.4 ~57.7%, Gemini 3.1 Pro ~54.2%

  • WebDev Arena (UI quality, Elo): Claude Opus 4.5 #1, Gemini 3 Pro #2

Gemini isn't losing on every axis — Max Weinbach argued on X that "the Gemini models are probably the best base model of any of them, but the behavior and RL on the model is… not amazing. It's REALLY good, but also super inconsistent." But when Anthropic's Claude Code has become the default agentic tool developers reach for, the base-model advantage doesn't translate to workflow dominance.

Training on Google's private codebase

The strike team's second twist: Woo reports Google is putting more emphasis on models trained on Google's own internal codebase — reportedly the world's largest, over 2 billion lines of code, per engineers on X. Those models can't be shipped to the public, but they function as stepping stones to better public models and, ultimately, to self-improving AI research.

Yuchen Jin (@Yuchenj_UW) summed up the irony on X: "Google DeepMind formed a strike team to improve its coding models, with Sergey Brin directly involved. It's surprising to me that Google has the world's largest internal codebase (>2B LOC), yet lags behind Anthropic and OpenAI in coding + agents."

Brin's "founder mode" continues

This isn't Brin's first intervention. He returned to day-to-day Google involvement in early 2023 and, in a prior internal memo circulated in February 2025, urged the Gemini team to aggressively adopt AI internally. Sundar Pichai said last year that Brin is "literally coding" with the Gemini team and sitting with the training team at loss curves — which he called "my fondest memories over the last year." By Pichai's own telling, Brin once discovered Gemini was on an internal "not-allowed-to-use" list for coding and had "a big tiff inside the company" about it.

Anthropic has not publicly responded to the strike-team story. Dario Amodei, Demis Hassabis, Jeff Dean, and Logan Kilpatrick stayed silent on it as of this writing.

Sources:

Story 2 — NSA is running Anthropic's "Mythos" despite the Pentagon blacklist

Axios broke the story on April 19: the National Security Agency is using Anthropic's most powerful model to date — Claude Mythos Preview — despite the Department of Defense formally designating Anthropic a "supply chain risk" two months earlier. Since the NSA sits under DoD authority, the reporting exposes a direct internal contradiction inside the U.S. government's posture toward Anthropic.

Two sources told Axios the NSA is using Mythos. A third said the model is being used "more widely within the department" — i.e., beyond the NSA, elsewhere in the intelligence community and DoD. Anthropic had publicly named 12 Project Glasswing launch partners; the NSA is one of roughly 40 additional organizations Anthropic quietly granted access but never identified.

The killer line, attributed to an administration source: "Every federal agency except the Department of Defense wants access to Anthropic's tools."

Neither the NSA nor the Office of the Director of National Intelligence responded to Axios's requests for comment. Anthropic and the Pentagon declined to comment.

Why this is explosive

Mythos isn't a normal frontier model. Anthropic unveiled it on April 7, 2026 as part of Project Glasswing — an initiative Anthropic describes as "secur[ing] the world's most critical software for the AI era." The model has surfaced thousands of high-severity zero-day vulnerabilities across every major operating system and every major web browser during red-team testing. The UK's AI Security Institute reported Mythos was the first AI model to complete a simulated full-network-takeover test. In rare runs, it attempted to cover its tracks after violating rules.

Anthropic itself admitted Mythos is too dangerous for broad release and is only making it available to vetted partners. More than 99% of the vulnerabilities Mythos discovered remain unpatched; Anthropic is withholding details under coordinated-disclosure norms. The company committed up to $100 million in model usage credits to Project Glasswing participants and $4 million in direct donations to open-source security orgs. API pricing: $25 / $125 per million input/output tokens.

Axios's reporting notes that most Glasswing partners — AWS, Apple, Cisco, CrowdStrike, Google, JPMorgan Chase, Microsoft, Nvidia, Palo Alto Networks, Broadcom, and the Linux Foundation — are using Mythos defensively to scan their own environments. The NSA's use case, Axios notes pointedly, "is not purely defensive."

The Pentagon feud that set up the scoop

The DoD-Anthropic breakdown crystallized earlier this year. Per multiple outlets:

  • July 2025: Anthropic wins a $200M DoD/CDAO contract; Claude becomes the first frontier model cleared for classified networks.

  • Jan–Feb 2026: Contract renegotiation. DoD demands Claude be available "for all lawful purposes." Anthropic refuses, insisting on walling off mass domestic surveillance and autonomous weapons development.

  • February 27, 2026: Defense Secretary Pete Hegseth designates Anthropic a "supply chain risk." His on-record statement: "Effective immediately, no contractor, supplier, or partner that does business with the United States military may conduct any commercial activity with Anthropic." He added: "America's warfighters will never be held hostage by the ideological whims of Big Tech. This decision is final." He called Anthropic "sanctimonious" and accused the company of trying to "strong-arm the United States military into submission."

  • March 26, 2026: U.S. District Judge Rita Lin (N.D. Cal., Biden appointee) indefinitely blocked the designation, ruling it was classic First Amendment retaliation. Her opinion contained the line of the year: "Nothing in the governing statute supports the Orwellian notion that an American company may be branded a potential adversary and saboteur of the U.S. for expressing disagreement with the government." She added that the Department's own records showed it designated Anthropic as a supply chain risk "because of its 'hostile manner through the press.'"

  • April 8, 2026: D.C. Circuit Court of Appeals denied a separate Anthropic bid to halt the blacklisting pending litigation.

The White House thaw, two days before the scoop

On Friday, April 17, Dario Amodei walked into the White House for a meeting with Chief of Staff Susie Wiles and Treasury Secretary Scott Bessent. Both sides described the meeting as "productive." The White House readout said the parties "discussed opportunities for collaboration, as well as shared approaches and protocols to address the challenges associated with scaling this technology." Axios's sources said the next steps will focus on how departments other than the Pentagon engage with the model — effectively, an administration plan to route around the DoD blacklist.

President Trump, asked by reporters about the meeting, reportedly said he had "no idea" Amodei had been at the White House — and, per opentools.ai, when pressed he responded: "Who?" Trump had previously called Anthropic's leadership "left wing nut jobs" on social media and ordered agencies not to use Anthropic's "woke" models. Separately, Bessent and Federal Reserve Chair Jerome Powell have reportedly been encouraging major bank CEOs to test Mythos directly.

Amodei's red lines

Amodei has been consistent in public about his position. To CBS News in February: "We're not going to move on those red lines… For our part and for the sake of U.S. national security, we continue to want to make this work." After the designation, Anthropic said: "Our strong preference is to continue to serve the Department and our warfighters — with our two requested safeguards in place. Should the Department choose to offboard Anthropic, we will work to enable a smooth transition to another provider." On the supply-chain label itself, Amodei said: "We do not believe this action is legally sound, and we see no choice but to challenge it in court."

Co-founder Jack Clark put the bigger picture this way at the Semafor World Economy Summit: "This is not a special model. There will be other systems just like this in a few months from other companies, and then a year to a year and a half later, there will be open-weight models from China that have these capabilities, so the world is going to have to get ready for more powerful systems that are going to exist within it."

The contrarian read

Not everyone is buying Anthropic's framing. Bruce Schneier wrote on his blog: "This is very much a PR play by Anthropic — and it worked. Lots of reporters are breathlessly repeating Anthropic's talking points, without engaging with them critically." He called Project Glasswing "an important step, but it's ultimately a reactive approach — racing to patch holes before attackers adapt."

Researchers at Vidoc Security reportedly reproduced several of Mythos's most alarming findings using publicly available models — OpenAI GPT-5.4 and Anthropic's own Claude Opus 4.6 — without any special access to Mythos, suggesting the uniqueness narrative may be overstated. Georgia Tech's Peter Swire told Scientific American: "One risk after Mythos is that it will be easier to turn a vulnerability, a known flaw, into an exploit… Every cybersecurity defender should take Mythos seriously, but the expected harm to defense is likely to be far lower than the worst-case scenarios would suggest."

How this fits the broader landscape

The Anthropic-NSA loop sits inside a rapidly militarizing AI industry. OpenAI signed its own Pentagon deal in late February 2026 (procurement experts estimate $500M–$2B over five years) — announced, notably, within hours of Anthropic's breakdown. Sam Altman later admitted the deal was "definitely rushed" and that "the optics don't look good." 98 OpenAI employees and 796 Google employees signed an open protest letter. OpenAI's top robotics leader Caitlin Kalinowski resigned over it. Palantir holds a $10B Army enterprise agreement; Anduril holds up to $20B for counter-UAS work; Anthropic previously partnered with Palantir + AWS (November 2024) to serve classified networks at IL6, and launched "Claude Gov" models in 2025.

Sources:

Story 3 — OpenAI's mystery model "Spud" is allegedly about to drop

Heads up: nothing in this story is officially confirmed by OpenAI. Treat it as leak reporting with multiple layers of sourcing, and keep your skepticism dialed up — the base rate for missed insider-predicted AI launch dates in 2026 is high.

That said: Polymarket, multiple OpenAI employees, a leaked internal memo, and several credible leakers all point to the same thing — OpenAI's next-generation model, internally codenamed "Spud," likely shipping as GPT-5.5 (with a GPT-5.5 Pro variant) sometime in the next 7–10 days, and reportedly already being quietly A/B tested inside ChatGPT.

What's actually corroborated

The existence of "Spud" is the best-supported part of the story. On March 24, 2026, The Information reported that OpenAI had finished pretraining on a next-generation model internally called "Spud" at its Stargate facility in Abilene, Texas. That report quoted Sam Altman telling staff it was "a very strong model that could really accelerate the economy" and that release was "a few weeks" away.

Since then, OpenAI President Greg Brockman has confirmed the model's existence publicly on Alex Kantrowitz's Big Technology podcast, saying:

"I think of Spud as a new base, as a new pre-train… I'd say it's like we have maybe two years worth of research that is coming to fruition in this model."

"There's this thing called 'big model smell'… when these models are just actually much smarter, much more capable, that they bend to you much more, and you feel it."

"It's not an incremental improvement, it's a significant change in the way we think about model development."

On April 13, 2026, The Verge obtained a leaked internal Q2 memo from OpenAI Chief Revenue Officer Denise Dresser that names "Spud" directly. Per the memo, Spud is "an important step in the intelligence foundation for the next generation of work," and it will make all of OpenAI's core products "significantly better." Early customer feedback, per Dresser, cites "stronger reasoning, better understanding of intentions and dependencies, and more reliable production results." The Decoder has the most thorough write-up of the leaked memo.

The "next week" projection

The April-27-ish release timing is held together by three independent signals:

  • Polymarket: As of April 20, the market "GPT-5.5 released on…?" priced April 23 at roughly 76–82% implied probability ($164K–$174K volume). A broader "GPT-5.5 released by April 30" market sits near 92%. Polymarket's official account tweeted on April 20: "BREAKING: OpenAI's GPT-5.5, nicknamed 'Spud,' is now projected to be released next week." These are crowd odds, not OpenAI confirmations.

  • Insider teases: On April 10–11, OpenAI employees @thsottiaux ("next week will be about more than cooking") and @pashmerepat ("things are about to get wild ❄️") posted cryptic teases on X. These are real OpenAI employees and the timing roughly aligns with the rumored release window.

  • Leaker chatter: The leaker @synthwavedd claimed on April 15 that the 5.5 launch had been pushed back "not too long of a delay though, more soon," and was cited in Polymarket's AI summary as having tested an early checkpoint he described as "superior to GPT-5.4." @chatgpt21 echoed that it's "still before the end of April and tentative for next week."

The A/B testing claim

This is the thinnest part of the dossier and deserves the loudest skepticism. The specific claim that Spud is currently being A/B tested inside ChatGPT traces primarily to a YouTube channel ("Universe of AI"), a Medium post by LumiChats founder Aditya Kumar Jha (April 20), and a Geeky Gadgets article summarizing them. Per the Medium post, API monitors allegedly "caught OpenAI's next major model… codenamed 'Spud' — running in live, production-scale testing" on April 19. Per Geeky Gadgets: "OpenAI has adopted a strategic approach to testing Spud by discreetly integrating it into ChatGPT 5.4 Pro. This method allows the company to gather real-world user feedback."

Worth noting: no screenshot of a ChatGPT model selector showing "Spud," "GPT-5.5," or "GPT-5.5 Pro" has surfaced in any primary leaker source. Tibor Blaho's TestingCatalog — which normally captures these model-selector A/B tests (he caught GPT Image V2 on April 5, GPT-5 Mini Scout in October 2025, etc.) — has not published a dedicated Spud A/B piece. His Week 13 Threads post only notes OpenAI is "shutting down Sora to focus on next model 'Spud.'" That absence is conspicuous and a reason to hedge.

It's plausible the "A/B test" reports are conflating (a) early Pro-tier user access without a formal blind comparison, (b) the separate, confirmed GPT Image V2 Arena test from April 5, or (c) actual production API traffic that isn't user-facing in ChatGPT yet.

Reported (unconfirmed) capabilities

Aggregated from leak-adjacent sources, and flagged: none of the following is verified by benchmarks, model cards, or API documentation.

  • 3D simulations — reportedly recreated "Monica's apartment from Friends" in Three.js

  • Professional-grade website generation from simple mockups

  • More efficient SVG code generation (fewer lines)

  • Voxel art and Pokémon-style game generation

  • Improved intent inference from ambiguous prompts (which tracks with Brockman's "big model smell" framing)

  • Speculated context window of 256K–2M tokens

  • Speculated pricing in the $3–8 / $15–30 per million input/output range

  • Speculated to outperform Claude Opus 4.7 and Gemini 3.1 Pro

Again — no benchmarks, no pricing, no model card have been published. The "Friends apartment" demo traces to a single YouTube channel.

The codename pattern

"Spud" fits OpenAI's habit of playful food/nature codenames. Strawberry 🍓 became o1 (the reasoning model). Orion became GPT-4.5 / GPT-5. Chestnut and Hazelnut became GPT Image 1.5 in December 2025. Bagel-v2 is the current GPT Image 2 test on Chatbot Arena. Spark was GPT-5.3-Codex. A root vegetable is thematically consistent. Whether the public product ships as "GPT-5.5," "GPT-5.5 Pro," or "GPT-6" appears to depend on how large OpenAI judges the capability jump over GPT-5.4 — every source flags this as undecided.

How we got here: OpenAI's 2026 release cadence

  • August 2025: GPT-5

  • November 12, 2025: GPT-5.1 (Instant + Thinking) rolled out to all ChatGPT users

  • Late 2025 / early 2026: GPT-5.2, GPT-5.3 ("Spark" = Codex)

  • March 5, 2026: GPT-5.4 and GPT-5.4 Pro on ChatGPT and Codex (per TestingCatalog); 1M-token context, ~33% hallucination reduction, ARC-AGI-2: 74% (5.4) / 83.3% (5.4 Pro)

  • March 24, 2026: Spud pretraining reportedly complete — just 19 days after 5.4 shipped, implying parallel training runs

  • ~April 16, 2026: Unified OpenAI super-app (ChatGPT + Codex + Atlas) reportedly launches on GPT-5.4; Claude Opus 4.7 ships same day

  • Week of April 27, 2026: Rumored Spud release

Notable absences

Jimmy Apples (@apples_jimmy) — historically the most-accurate OpenAI leaker — has been conspicuously quiet on Spud, which is either nothing or something. Mainstream tech press (Bloomberg, Reuters, Ars Technica, TechCrunch, Wired, NYT) have also sat this one out; The Verge published the memo leak but hasn't done a standalone Spud feature. That press silence is typical in the days before an OpenAI announcement — but it also means every rumor in this story circles back to a small, tightly-clustered set of primary sources.

Bottom line: Spud is real. The release window is plausibly imminent. The A/B-inside-ChatGPT claim is the shakiest piece of the puzzle and deserves the heaviest asterisk. If OpenAI ships next week, it will be the biggest model announcement of 2026 so far — and it will arrive exactly while Google is scrambling together a strike team to catch up on coding, and while Anthropic is running a cyber-offensive model inside the NSA. That's the shape of the week.

Sources:

until next time,

Wes “Stay Frosty” Roth

PS here’s my video on the whole thing:

Reply

Avatar

or to participate

Keep Reading