- NATURAL 20
- Posts
- OpenAI Wins Gold in Math
OpenAI Wins Gold in Math
PLUS: Post-Labor Economics: How 40% Unemployment Could Rewrite the Future

The best marketing ideas come from marketers who live it.
That’s what this newsletter delivers.
The Marketing Millennials is a look inside what’s working right now for other marketers. No theory. No fluff. Just real insights and ideas you can actually use—from marketers who’ve been there, done that, and are sharing the playbook.
Every newsletter is written by Daniel Murray, a marketer obsessed with what goes into great marketing. Expect fresh takes, hot topics, and the kind of stuff you’ll want to steal for your next campaign.
Because marketing shouldn’t feel like guesswork. And you shouldn’t have to dig for the good stuff.
Today:
OpenAI Wins Gold in Math
DeepMind AI Cracks Hard Geometry
Qwen2.5 Fails Fresh Math Tests
Elon Musk Will Launch Baby Grok
Meta Poaches Apple AI Talent
OpenAI just solved math
OpenAI’s reasoning model has earned a gold medal at the 2025 International Mathematical Olympiad (IMO), solving five of six problems and scoring 35/42 points. Unlike Google DeepMind’s math-specific systems, this is a general-purpose AI performing at the level of the world’s best students.
The achievement is considered a major step toward general intelligence and shows how quickly AI reasoning is advancing. GPT‑5, however, is still yet to come.
AlphaGeometry, a system developed by Google DeepMind and NYU, solves complex geometry problems without relying on human-written proofs. It learns from 100 million computer-generated theorems and guides a logic engine to find solutions. On olympiad-level tests, it nearly matches an average gold-medalist’s performance. The model writes readable solutions, makes new discoveries, and handles geometry better than GPT-4 or earlier methods—marking a big leap in automated reasoning.

Why This Matters
Breakthrough in Reasoning: AlphaGeometry shows AI can solve abstract, logic-heavy tasks previously thought too symbolic or open-ended.
Human-Free Learning: It trains without any human-written examples, proving synthetic data can power deep reasoning models.
New AI Capabilities: The model creates human-readable mathematical proofs—suggesting future AIs could autonomously explore science and mathematics.
A study reveals that Alibaba’s Qwen2.5 math performance comes mostly from memorizing training data, not true reasoning. When tested on fresh benchmarks, its accuracy collapsed, unlike its strong results on contaminated datasets like MATH-500. Synthetic tests confirmed that only correct feedback improves its skills. The findings raise concerns about misleading benchmarks and emphasize the need for clean evaluation to assess AI’s real reasoning ability.
Why This Matters
Exposes Benchmark Flaws – Shows how contaminated datasets can inflate AI performance claims.
Challenges True Reasoning Claims – Highlights that high scores may stem from memorization, not actual understanding.
Improves AI Evaluation – Stresses the need for cleaner tests to measure genuine problem-solving in future models.
Elon Musk announced Baby Grok, a child-friendly version of the Grok AI chatbot, designed for safe, educational interactions. Developed by xAI, it follows the recent launch of Grok4, which Musk claims could discover useful new technologies within a year. The move comes amid criticism over Grok’s prior antisemitic remarks and concerns about inappropriate content in earlier modes, prompting a push for safer AI experiences tailored to children.
Why This Matters
Child-Safe AI Design – Highlights growing demand for AI tools that prioritize safety and education for kids.
AI Innovation Pace – Musk’s claim that Grok may soon discover new technologies underscores the rapid evolution of reasoning models.
Ethical Standards – Addresses public concerns over harmful AI outputs, influencing how AI companies approach content moderation and user trust.
AI’s White‑Collar Shake‑Up | David Shapiro’s Deep Dive on Post‑Labor Economics
AI and automation are accelerating a decades-long decline in labor demand. As more jobs vanish, society faces a crisis: how do people earn money if there’s no work? The answer may lie in a new economy built on ownership, dividends, and collective agency rather than wages.
But this shift could also weaken political power and meaning tied to work—forcing us to rethink purpose, income, and freedom in a radically different world.
🧠RESEARCH
This survey introduces Context Engineering—the science of feeding better information to AI models. It breaks down how context is retrieved, processed, and managed, and how systems like memory or tools use it. Despite great progress in understanding context, models still struggle to generate long, complex answers. That’s the next challenge.
VisionThink is a new vision-language model that saves computation by adjusting image resolution based on task complexity. It uses reinforcement learning to decide when higher resolution is needed, excelling in text-heavy tasks like OCR while reducing visual tokens for simpler tasks. This approach improves efficiency without sacrificing accuracy.
π³ is a neural network that reconstructs 3D visual geometry without relying on a fixed reference view. Its permutation-equivariant design ensures robustness to input order and improves accuracy in camera pose, depth, and point map reconstruction. This approach sets new performance benchmarks across multiple visual geometry tasks.
🛠️TOP TOOLS
PlayPhrase - Allows users to search for and play video clips containing specific phrases from movies and TV shows.
Star By Face - AI-powered celebrity look-alike app that allows users to discover which famous personalities they resemble.
AI Time Machine - Transform celebrities into historical figures, offering a humorous and imaginative glimpse into the past.
Ask Your PDF - AI-powered platform that revolutionizes document interaction, allowing users to engage in intelligent conversations with their PDF files, extract key insights, and manage information efficiently across multiple platforms.
📲SOCIAL MEDIA
Universal Paperclips is a game where an AI tasked with creating paperclips becomes superintelligent and proceeds to turn all matter (including humans) into paperclips.
I wanted to see if OpenAI's Agent could beat the game thereby destroying all humans.
good news... it did NOT
— Wes Roth (@WesRothMoney)
5:31 AM • Jul 19, 2025
🗞️MORE NEWS
Meta has hired two top AI researchers, Mark Lee and Tom Gunter, from Apple, soon after recruiting their former boss. Both will join Meta’s Superintelligence Labs team to advance the company’s artificial intelligence efforts.
Invideo AI, built on OpenAI’s GPT‑4.1, image, and text-to-speech models, lets users create professional videos from simple prompts in minutes. It reduces production time tenfold, tailors content for platforms, and supports over 50 million creators.
DuckDuckGo now offers a setting to hide AI-generated images in search results. Users can toggle “AI images: show/hide” in the Images tab or enable it via search settings. It uses curated blocklists to reduce AI content.
A U.S. judge ruled that authors can pursue a class-action lawsuit against Anthropic, alleging it illegally downloaded millions of books from pirated libraries to train its AI. The case highlights rising copyright battles against AI companies.
Cognition acquired AI coding startup Windsurf after a turbulent period marked by failed OpenAI talks and key team defections to Google DeepMind. Interim CEO Jeff Wang described morale struggles but praised the deal for protecting employees and combining complementary strengths.
Meta refused to sign the EU’s new AI code of practice, calling it excessive and harmful to innovation. Global affairs chief Joel Kaplan argued the rules go beyond the AI Act and could hinder AI development in Europe.
The White House is drafting an executive order requiring AI companies with federal contracts to ensure political neutrality, targeting what officials call “woke” AI. The move aims to counter perceived liberal bias in AI models.
What'd you think of today's edition? |
Reply