- NATURAL 20
- Posts
- VideoGameBench Shows AI Game Skills
VideoGameBench Shows AI Game Skills
PLUS: AI Startup Aims To Replace Workers, OpenAI o3 Scores Perfect 100% and more.

Today:
VideoGameBench Shows AI Game Skills
xAI Launches Grok 3 Mini
AI To Draft UAE Legislation
AI Startup Aims To Replace Workers
OpenAI o3 Scores Perfect 100%
VideoGameBench Tutorial | AI Vision Will Never be the same…
A new open-source project lets you run old-school MS-DOS games, like Doom, Pokémon Red, and Warcraft II, using AI models such as GPT-4.0, Gemini 2.5 Pro, and Claude Sonnet. The AI plays by analyzing screenshots and deciding moves. Claude Sonnet performed best in tests.
The video guides users, even Windows beginners, through setup and game control using simple commands, emphasizing accessibility and fun AI benchmarking.
xAI has launched Grok 3 Mini, a small but powerful language model offering high reasoning performance at a fraction of the cost of leading AI models. Despite its size, it tops benchmarks in math and programming, even beating bigger rivals like DeepSeek R1 and Claude 3.7. Grok 3 Mini also provides full reasoning traces for transparency, while its affordability intensifies the AI model price war sparked by Google’s Gemini 2.5.

Why this matters
Grok 3 Mini drastically reduces access costs for high-performing AI, democratizing use.
It offers full reasoning traces, aiding model interpretability and developer trust.
It pressures giants like OpenAI and Google, accelerating innovation and cost drops in the AI space.
The UAE will become the first country to use AI to write, review, and amend laws. Officials say it could speed up lawmaking by 70% and help predict needed legal changes. A new Regulatory Intelligence Office will oversee this effort. While praised for its bold ambition, experts warn about AI’s reliability, bias, and unpredictable outcomes—emphasizing the need for human oversight to prevent errors and maintain trust in the legal system.
Why this matters
It positions AI not just as a tool but as a co-legislator—an unprecedented use of generative AI in government.
If successful, it could radically lower costs and time in legal systems globally.
It spotlights the pressing need for guardrails around AI decisions in high-stakes societal areas.
Famed AI researcher Tamay Besiroglu launched Mechanize, a startup aiming to fully automate all work and replace human labor with AI agents. Backed by top tech figures, the project drew backlash for its extreme mission and its founder's ties to respected AI research institute Epoch. While Besiroglu claims it could boost wealth and productivity, critics warn it could deepen inequality, undermine jobs, and compromise the neutrality of AI research institutions.
Why this matters
Mechanize openly targets total labor automation, a provocative step in AI's role in reshaping economies.
The founder's link to Epoch raises questions about the objectivity and neutrality of AI performance benchmarks.
The startup highlights both the promise and current shortcomings of AI agents—fueling competition in agentic AI innovation.
🧠RESEARCH
CLIMB is a new method to improve language model training by automatically finding the best mix of training data. It groups data by meaning, tests combinations using a small model, and refines the mix over time. This approach beats strong baselines and shows major gains, especially when tailored to specific topics.
Antidistillation sampling is a technique that protects powerful language models from being copied through distillation. It subtly alters the model’s output probabilities to make the reasoning traces less useful for training copycat models, without hurting the original model’s performance. It’s a strategic defense to preserve model originality and value.
FramePack is a new method for training video generation models that compresses input frames to keep processing efficient, no matter the video length. This boosts training speed and allows larger batch sizes. It also reduces error buildup by generating frames backward from fixed endpoints. The approach improves both quality and efficiency.
🛠️TOP TOOLS
GPTKit - AI-powered text detection tool designed to distinguish between human-written and AI-generated content with high accuracy.
ImageToCartoon - AI-powered online tool that transforms photos into cartoon-style images quickly and easily.
Wonder Dynamics - AI-powered visual effects company that revolutionizes the film and entertainment industry.
Watermark Remover IO - AI-powered online tool designed to efficiently remove watermarks, logos, text, and other unwanted elements from images while preserving the original quality.
Typeframes - AI-powered text-to-video creation tool designed to simplify the process of producing engaging video content for platforms like YouTube, Instagram, and TikTok.
📲SOCIAL MEDIA
Just announced new versions of Gemma 3 – the most capable model to run just one H100 GPU – can now run on just one *desktop* GPU!
Our Quantization-Aware Training (QAT) method drastically brings down memory use while maintaining high quality. Excited to make Gemma 3 even more
— Sundar Pichai (@sundarpichai)
3:56 PM • Apr 18, 2025
🗞️MORE NEWS
OpenAI’s o3 model scored a perfect 100% on a key test for understanding long stories—far outperforming rivals. It sets a new standard for working with very large texts, not just claiming it.
Figma is building an AI app creator powered by Anthropic’s Claude Sonnet. It takes text, images, and Figma files as input. The company is also developing “Figma Sites,” a tool for generating websites.
Intel’s new CEO Lip-Bu Tan is cutting management layers to speed decisions and boost innovation. He promoted Sachin Katti as chief of AI and tech strategy, aiming to directly compete with Nvidia in AI chips.
China is catching up fast in AI, fueled by top university talent and government support. Startups like DeepSeek now rival U.S. models, despite chip shortages. Fierce competition and academic ties drive rapid innovation and global ambition.
A new Anthropic study reveals that university students, predominantly from STEM, depend on its AI assistant Claude for higher-level academic tasks. Many use it for creating and analyzing assignments, raising concerns about learning and integrity.
AI-generated music now makes up 18% of tracks uploaded to Deezer, doubling in just four months. The surge raises concerns over copyright violations, fair artist pay, and the future role of human creativity in music.
What'd you think of today's edition? |
Reply