NATURAL 20
Posts
Google DeepMind Launches AlphaEvolve Agent

Google DeepMind Launches AlphaEvolve Agent

PLUS: OpenAI Launches Safety Evaluations Hub, Grok Inserts Far-Right Claims Accidentally and more.

Wes Roth
May 15, 2025

In partnership with

SUBSCRIBE | AI TOOLS | LEARN AI

Try Artisan’s All-in-one Outbound Sales Platform & AI BDR

Ava automates your entire outbound demand generation so you can get leads delivered to your inbox on autopilot. She operates within the Artisan platform, which consolidates every tool you need for outbound:

300M+ High-Quality B2B Prospects, including E-Commerce and Local Business Leads
Automated Lead Enrichment With 10+ Data Sources
Full Email Deliverability Management
Multi-Channel Outreach Across Email & LinkedIn
Human-Level Personalization

Book a demo to see what Ava can do.

Today:

Google DeepMind Launches AlphaEvolve Agent
OpenAI Releases GPT-4.1 For ChatGPT
Anthropic Prepares New Thinking Models
OpenAI Launches Safety Evaluations Hub
Grok Inserts Far-Right Claims Accidentally

Google's New "AlphaEvolve" SHOCKING Ability…

Google DeepMind’s Alpha Evolve teams Gemini models with test scripts to rewrite code and math. It saved 0.7 % of Google’s computing power and sped circuits in the TPU, its special AI chip, by 23 %.

Gemini also used Alpha Evolve to sharpen its training—the first case of an AI improving itself. This early self-help signals faster gains, because each stronger model can design better software, methods and hardware, pushing progress forward.

WATCH THE VIDEO ON YOUTUBE

OpenAI Releases GPT-4.1 For ChatGPT

OpenAI has launched GPT-4.1 and 4.1 mini in ChatGPT, targeting enterprise users who need efficient, accurate, and cost-effective AI. GPT-4.1 excels in coding, instruction-following, and safety while maintaining faster response times and higher benchmarks than GPT-4o or GPT-4.5. It's optimized for practical deployment, offering solid factual accuracy, reduced hallucinations, and robust compliance features—making it ideal for enterprises focused on reliable, scalable LLM integration without the overhead of massive models.

Why This Matters

GPT-4.1 signals a pivot from massive models to efficient, purpose-built ones designed for deployment in real-world business operations.
The model balances performance with safety, showing resistance to jailbreaks and offering structured message prioritization—key for secure enterprise use.
Its coding performance and long-context support make it a go-to model for software engineers, LLM orchestration teams, and DevOps pipelines.

Anthropic Prepares New Thinking Models

Anthropic is preparing to launch new versions of Claude Sonnet and Opus that combine deep reasoning with real-time tool use. These models can switch between exploring solutions and interacting with external tools—then self-correct if stuck. They’re designed to handle complex tasks like coding or research with minimal user input. Despite past mixed reviews, Anthropic is doubling down on its “thinking” model strategy, signaling confidence in test-time compute and agentic autonomy.

Why This Matters

These Claude models move beyond static outputs to dynamically reason, self-correct, and use tools—essential traits for future AI agents.
The models aim to reduce the need for detailed prompts, making AI more autonomous in professional tasks like software engineering or research.
Despite earlier criticism, Anthropic’s continued investment in test-time compute shows it’s betting heavily on deep, iterative reasoning as the next frontier in AI.

OpenAI Launches Safety Evaluations Hub

OpenAI has launched a Safety Evaluations Hub to regularly share how its models perform on tests for harmful content, jailbreaks, and hallucinations. This move comes after backlash over rushed testing and a problematic GPT-4o update that was quickly rolled back. OpenAI says the hub will grow over time, reflecting efforts to be more transparent, engage the community, and improve safety standards as AI models evolve and impact real-world use.

Why This Matters

Regular public safety metrics set a new transparency bar for other AI developers.
OpenAI encourages broader collaboration in testing AI reliability, shaping future standards.
The GPT-4o incident shows why robust safety testing before deployment is crucial, especially as AI systems scale in real-world use.

🧠RESEARCH

MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder

MiniMax-Speech is a new AI model that turns text into speech in 32 languages. It can mimic a voice from just one audio clip—no text needed—and clone it with high accuracy. It also lets users control voice emotion or style, and ranks first on a top speech technology leaderboard.

Fast Text-to-Audio Generation with Adversarial Post-Training

This paper introduces a new method called ARC post-training to make text-to-audio generation much faster. It improves how well AI follows prompts and cuts generation time dramatically—producing 12 seconds of high-quality audio in just 75 milliseconds on powerful hardware. It’s currently the fastest known text-to-audio system.

Unified Continuous Generative Models

This paper introduces a unified training and sampling method for generative AI models that blends slow multi-step methods (like diffusion) with faster few-step ones (like consistency models). Their system, UCGM, achieves top performance on image generation tasks, cutting generation steps while improving quality. It simplifies workflows and boosts efficiency across model types.

🛠️TOP TOOLS

OpusClip - AI-powered video repurposing tool designed to transform long-form content into engaging short clips for social media platforms.

Lalamu Studio - AI-powered tool designed to simplify the creation of lip-sync videos.

IllusionDiffusion - AI-powered tool that transforms text prompts and images into mesmerizing optical illusions and artistic creations.

MyMap AI - AI-powered platform that revolutionizes the way users create and interact with visual content.

Auto Seduction AI - Dating assistant that leverages artificial intelligence to generate personalized conversation starters and messages for online dating platforms.

📲SOCIAL MEDIA

👀um... seems like kinda a big deal?
Google Deepmind reveals a model that seemingly is improving *it's own* software and hardware stack...
here's what YOU need to know about AlphaEvolve 🧵:
— Wes Roth (@WesRothMoney)
2:34 AM • May 15, 2025

🗞️MORE NEWS

Elon Musk’s AI chatbot Grok glitched, inserting far-right “white genocide” claims into unrelated conversations. It later admitted the flaw stemmed from conflicting instructions by its creators. The incident raised fresh concerns over AI bias, misuse, and political influence.
YouTube is rolling out an AI-powered ad system using Gemini to insert ads at the most “engaging” moments in videos. Announced at Brandcast, the move aims to boost viewer attention and improve ad performance through smarter placement.
Perplexity is teaming up with PayPal to let users shop directly within its AI chat, buying everything from flights to concert tickets. Payments are handled via PayPal or Venmo, streamlining checkout and support. This comes as Perplexity seeks $500M in funding and pushes deeper into AI-driven, “agentic” e-commerce.
Stability AI has launched Stable Audio Open Small, a lightweight audio-generation model that runs directly on smartphones. Trained on royalty-free music, it generates short stereo sound clips offline in under 8 seconds. While limited in vocal quality and musical diversity, it's free for small developers. Larger companies must license it.
TikTok has launched “AI Alive,” a tool that lets users turn photos into short videos using prompts. It adds motion and effects to still images, is accessible through the Story Camera, and includes safety checks and AI labels.
A new SimilarWeb report reveals surging use of AI coding tools, sharp drops in writing apps, and signs of AI replacing legacy platforms like Fiverr and Bing. Tools like Cursor and Lovable saw explosive growth, while Grok and DeepSeek faded after viral spikes.

What'd you think of today's edition?

Reply

or to participate.