Grok 3 Leads in AI Benchmark

PLUS: Perplexity Open-Sources R1 1776 Model, Ilya Sutskever's Startup Eyes $1B Round and more.

In partnership with

Hire Ava, the Industry-Leading AI BDR

Ava automates your entire outbound demand generation so you can get leads delivered to your inbox on autopilot. She operates within the Artisan platform, which consolidates every tool you need for outbound:

  • 300M+ High-Quality B2B Prospects

  • Automated Lead Enrichment With 10+ Data Sources Included

  • Full Email Deliverability Management

  • Personalization Waterfall using LinkedIn, Twitter, Web Scraping & More

Today:

  • Grok 3 Leads in AI Benchmark

  • Mira Murati Launches Thinking Machines Lab

  • OpenAI Launches SWE-Lancer Benchmark

  • Perplexity Open-Sources R1 1776 Model

  • Ilya Sutskever's Startup Eyes $1B Round

Grok 3 DESTROYS everyone... #1 in EVERY Category

Elon Musk's team at XAI has launched Grok 3, surpassing previous models like Gemini and OpenAI’s 03 Mini in several benchmarks, including reasoning and high-level math tasks.

Utilizing 200,000 GPUs in a vast data center, Grok 3 outperforms competitors and holds a strong lead in chatbot arenas. The model's growth is attributed to heavy GPU investment, with plans to expand to 1 million GPUs. Early testing shows strong performance, though further analysis is ongoing.

Mira Murati, former CTO of OpenAI, co-founded Thinking Machines Lab, a new AI startup focused on making AI systems more understandable and customizable. The company aims to share its technology openly with external researchers. Murati, who left OpenAI after a leadership dispute, joins other former executives in launching AI ventures, contributing to the global race for advanced AI development. The lab has not disclosed its funding status.

OpenAI has introduced SWE-Lancer, a benchmark that assesses AI coding performance using over 1,400 freelance software engineering tasks worth $1 million. Covering areas from UI/UX to systems design, it provides a realistic evaluation of AI capabilities in real-world scenarios. Despite its promise, current AI models still face challenges in handling many of these tasks, highlighting the gap in AI's practical abilities.

Perplexity has open-sourced R1 1776, a post-trained version of the DeepSeek-R1 model, designed to provide unbiased, factual information. The model, which performs close to state-of-the-art reasoning models, had previously been limited by censorship, particularly on sensitive topics. The new version mitigates these issues through careful post-training on censored content, maintaining reasoning capabilities while allowing for a broader range of discussions. Users can access the model weights on HuggingFace or via the Sonar API.

Ilya Sutskever’s AI startup, Safe Superintelligence, is nearing a $1 billion fundraising round at a $30 billion valuation, surpassing earlier expectations. Led by Greenoaks Capital Partners, the round could bring the company’s total funding to $2 billion. Founded by Sutskever and other former OpenAI researchers, Safe Superintelligence has attracted investments from Sequoia Capital, Andreessen Horowitz, and DST Global. While it is not yet generating revenue, the startup does not plan to sell AI products in the immediate future.

🧠RESEARCH

The NSA (Native Sparse Attention) mechanism improves long-context modeling by combining sparse attention with hardware-optimized design. It uses a dynamic hierarchical strategy for token compression and selection, offering speedups in training and inference while maintaining model performance. NSA outperforms full attention on long-context tasks and enhances efficiency.

This paper presents a learning framework for teaching humanoid robots how to get up after a fall, overcoming challenges like varied postures and terrain. Using a two-phase approach, the method first discovers a trajectory and then refines it for smooth, robust motions. It successfully enables a robot to get up from different positions on diverse surfaces.

SWE-Lancer introduces a benchmark of over 1,400 freelance software engineering tasks valued at $1 million, assessing both technical and managerial tasks. Despite testing frontier models, results show they still struggle with most tasks. The benchmark, open-sourced for future research, aims to explore AI's economic impact on freelance work.

HermesFlow introduces a framework to close the gap between multimodal understanding and generation in large language models. By using homologous preference data and optimizing through Pair-DPO and self-play, it aligns both capabilities effectively. Experiments show HermesFlow outperforms previous methods, offering a promising approach for future multimodal models.

🛠️TOP TOOLS

DiagramGPT - AI-powered tool developed by Fraser Xu that enables users to generate a variety of diagram types using natural language input.

Bai Chat - AI platform designed to simplify the integration of artificial intelligence into various workflows for professionals, developers, and businesses.

Image To Font Finder - AI-powered tool designed to help users identify fonts from any image.

iAsk All - AI-powered search engine designed to revolutionize the way users access information online.

Human or AI Game - Online game and research project designed to test the ability of participants to distinguish between human and AI in a conversational setting.

📲SOCIAL MEDIA

🗞️MORE NEWS

  • Wu Yonghui, a former Google researcher, joined ByteDance's AI team, marking a significant hire. He previously led Google's Gemini models and now reports directly to ByteDance CEO Liang Rubo, strengthening the company's AI capabilities.

  • Meta announces two key events for 2025: LlamaCon on April 29, focusing on open-source AI for developers, and Meta Connect on September 17-18, showcasing updates in virtual reality, AI glasses, and mixed reality technologies.

  • To prevent hostile takeovers, OpenAI is considering granting its non-profit board special voting rights, allowing it to override major investors. This comes after a $97.4 billion buyout offer from a group led by Elon Musk was rejected.

  • AI-generated optical illusions are being explored as a new form of CAPTCHA to distinguish humans from bots. These illusions, which AI systems struggle to recognize, could enhance website security by tripping up software while remaining easy for humans to pass.

What'd you think of today's edition?

Login or Subscribe to participate in polls.

Reply

or to participate.