NATURAL 20
Posts
AI Outperforms Humans In Business Test

AI Outperforms Humans In Business Test

PLUS: ClickHouse Aims For $6B Valuation, Microsoft, OpenAI Renegotiate Partnership Terms and more.

Wes Roth
May 12, 2025

In partnership with

SUBSCRIBE | AI TOOLS | LEARN AI

You’ve never experienced business news like this

Morning Brew delivers business news the way busy professionals want it — quick, clear, and written like a human.

No jargon. No endless paragraphs. Just the day’s most important stories, with a dash of personality that makes them surprisingly fun to read.

No matter your industry, Morning Brew’s daily email keeps you up to speed on the news shaping your career and life—in a way you’ll actually enjoy.

Best part? It’s 100% free. Sign up in 15 seconds, and if you end up missing the long, drawn-out articles of traditional business media, you can always go back.

Check it out

Today:

AI Outperforms Humans In Business Test
OpenAI Acquires Windsurf For $3B
CoreWeave Seeks $1.5B After IPO Flop
ClickHouse Aims For $6B Valuation
Microsoft, OpenAI Renegotiate Partnership Terms

AI JUST BEAT humans at running a business…

Researchers tested AI agents' ability to manage a vending machine business, starting with $500. Claude 3.5 Sonnet performed best, earning over $2,000, while Claude 3.7 made $15,600. Surprisingly, a human came in fourth with $844.

The main issue was long-term coherence; AI agents performed well initially but often broke down or made irrational decisions, like contacting the FBI over a $2 fee. Improving task-specific modules could enhance performance.

WATCH THE VIDEO ON YOUTUBE

OpenAI Acquires Windsurf For $3B

OpenAI bought coding startup Windsurf for $3 billion, causing Google's stock to drop. The deal highlights a surge in acquisitions in AI-driven software tools. Windsurf helps developers build apps faster using AI, intensifying competition with Google and Amazon. Investors debate which AI startups are worth funding, marking a turning point for the AI industry. Engineers remain skeptical about AI replacing their jobs, seeing it as a productivity enhancer.

Why This Matters

It sets a benchmark valuation for AI developer tools, influencing future investments.
Intensifies competition among tech giants (OpenAI, Google, Amazon) in AI developer ecosystems.
Highlights the shift toward integrating AI into practical, high-impact software development tools.

CoreWeave Seeks $1.5B After IPO Flop

CoreWeave, a data-center provider serving AI clients like Microsoft, is seeking a $1.5 billion debt deal after a disappointing IPO. Originally aiming for $2.7 billion, CoreWeave cut its fundraising goal due to high debt and a softening AI infrastructure market. The company already has $8 billion in debt and faces significant repayments by 2026. JPMorgan is managing the current investor discussions as CoreWeave assesses its financial options.

Why This Matters

Highlights investor caution and market saturation in AI infrastructure funding.
Reflects challenges AI-focused companies face managing heavy debt amid uncertain markets.
May signal broader investor hesitation around large-scale infrastructure projects needed for AI growth.

ClickHouse Aims For $6B Valuation

Database startup ClickHouse, seen as a rival to Snowflake, aims to raise funds at a $6 billion valuation, triple its value from four years ago, led by Khosla Ventures. Its software, initially developed at Yandex, excels in real-time analytics for AI developers. With roughly $70 million annual revenue, its valuation reflects high investor expectations for database tools as AI agents gain popularity, competing against Snowflake, Elastic, and Datadog.

Why This Matters

It underscores growing investor interest in database technology supporting AI applications.
Highlights demand for efficient, real-time data processing tools crucial for emerging AI agents.
Reflects intensifying competition among database software providers catering specifically to AI-driven analytics.

🧠RESEARCH

Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models

This survey reviews the evolution of large multimodal reasoning models (LMRMs), which integrate text, images, audio, and video to enhance reasoning. It traces the shift from modular pipelines to unified, language-centric frameworks, highlighting new methods like Multimodal Chain-of-Thought (MCoT). The paper discusses challenges in generalization, reasoning depth, and adaptive planning for real-world tasks.

On Path to Multimodal Generalist: General-Level and General-Bench

General-Level is a new framework for evaluating multimodal AI models, measuring their ability to both understand and generate diverse content (text, images, etc.). It includes General-Bench, a large test set with 700 tasks, highlighting gaps toward human-like intelligence and guiding future advances toward truly general-purpose AI systems.

StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant

StreamBridge is a method for upgrading offline video-language models into real-time assistants. It adds memory for better multi-step interactions and proactive responses. Supported by a custom dataset (Stream-IT), StreamBridge notably enhances streaming performance, surpassing advanced models like GPT-4o and Gemini 1.5 Pro in real-time video tasks.

X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains

X-Reasoner is a vision-language model designed for generalizable reasoning across modalities and domains. Trained initially on general text data, it later incorporates reinforcement learning for improved reasoning transfer. X-Reasoner excels in both general and medical benchmarks, outperforming existing models. A specialized version, X-Reasoner-Med, sets new standards in medical reasoning tasks.

LiftFeat: 3D Geometry-Aware Local Feature Matching

LiftFeat is a lightweight network for robust local feature matching in challenging conditions like poor lighting or repetitive patterns. It combines 3D geometric features with 2D descriptors using a pre-trained depth model. LiftFeat significantly improves accuracy in tasks like pose estimation and visual localization, outperforming other lightweight methods.

🛠️TOP TOOLS

PlayPhrase - Allows users to search for and play video clips containing specific phrases from movies and TV shows.

Star By Face - AI-powered celebrity look-alike app that allows users to discover which famous personalities they resemble.

AI Time Machine - Transform celebrities into historical figures, offering a humorous and imaginative glimpse into the past.

Ask Your PDF - AI-powered platform that revolutionizes document interaction, allowing users to engage in intelligent conversations with their PDF files, extract key insights, and manage information efficiently across multiple platforms.

NetworkAI - AI-powered networking tool developed by Wonsulting to streamline and enhance the job search process.

📲SOCIAL MEDIA

We're missing (at least one) major paradigm for LLM learning. Not sure what to call it, possibly it has a name - system prompt learning?
Pretraining is for knowledge.
Finetuning (SL/RL) is for habitual behavior.
Both of these involve a change in parameters but a lot of human
— Andrej Karpathy (@karpathy)
12:55 AM • May 11, 2025

🗞️MORE NEWS

Microsoft and OpenAI are renegotiating their partnership, focusing on Microsoft's equity stake and future tech access. Talks are tense due to OpenAI's restructuring plans, expanding business ambitions, and rising competition between the two companies.
ByteDance released Agent TARS, an open-source AI tool that automates tasks by visually reading web content and interacting with files. Currently experimental and macOS-only, it provides live feedback and lets users intervene during tasks.
Zencoder launched Zen Agents, an AI platform enabling teams to create and share specialized coding tools. It automates team workflows, reduces delays, and provides an open-source marketplace for community-built agents, helping developers collaborate efficiently.
SoundCloud clarified it hasn't used user-uploaded music to train AI models, despite updating its terms allowing future AI use. If it ever chooses to use user content, SoundCloud promises clear communication and opt-out options for creators.
Alibaba's Qwen introduced "Web Dev," an AI tool creating complete front-end code from simple user prompts. Using Qwen3 models, it quickly generates websites and interactive features, positioning Alibaba against similar products from OpenAI and Anthropic.

What'd you think of today's edition?

Reply

or to participate.