• NATURAL 20
  • Posts
  • AI Outperforms Humans In Business Test

AI Outperforms Humans In Business Test

PLUS: ClickHouse Aims For $6B Valuation, Microsoft, OpenAI Renegotiate Partnership Terms and more.

In partnership with

The newsletter every professional should be reading

There’s a reason Morning Brew is the gold standard of business news—it’s the easiest and most enjoyable way to stay in the loop on all the headlines impacting your world.

Tech, finance, sales, marketing, and everything in between—we’ve got it all. Just the stuff that matters, served up in a fast, fun read.

Look—over 4 million professionals start their day with Morning Brew’s daily newsletter, and it only takes 5 minutes to read. Sign up for free and see for yourself!

Today:

  • AI Outperforms Humans In Business Test

  • OpenAI Acquires Windsurf For $3B

  • CoreWeave Seeks $1.5B After IPO Flop

  • ClickHouse Aims For $6B Valuation

  • Microsoft, OpenAI Renegotiate Partnership Terms

AI JUST BEAT humans at running a business…

Researchers tested AI agents' ability to manage a vending machine business, starting with $500. Claude 3.5 Sonnet performed best, earning over $2,000, while Claude 3.7 made $15,600. Surprisingly, a human came in fourth with $844. 

The main issue was long-term coherence; AI agents performed well initially but often broke down or made irrational decisions, like contacting the FBI over a $2 fee. Improving task-specific modules could enhance performance.

OpenAI bought coding startup Windsurf for $3 billion, causing Google's stock to drop. The deal highlights a surge in acquisitions in AI-driven software tools. Windsurf helps developers build apps faster using AI, intensifying competition with Google and Amazon. Investors debate which AI startups are worth funding, marking a turning point for the AI industry. Engineers remain skeptical about AI replacing their jobs, seeing it as a productivity enhancer.

Why This Matters

  1. It sets a benchmark valuation for AI developer tools, influencing future investments.

  2. Intensifies competition among tech giants (OpenAI, Google, Amazon) in AI developer ecosystems.

  3. Highlights the shift toward integrating AI into practical, high-impact software development tools.

CoreWeave, a data-center provider serving AI clients like Microsoft, is seeking a $1.5 billion debt deal after a disappointing IPO. Originally aiming for $2.7 billion, CoreWeave cut its fundraising goal due to high debt and a softening AI infrastructure market. The company already has $8 billion in debt and faces significant repayments by 2026. JPMorgan is managing the current investor discussions as CoreWeave assesses its financial options.

Why This Matters

  1. Highlights investor caution and market saturation in AI infrastructure funding.

  2. Reflects challenges AI-focused companies face managing heavy debt amid uncertain markets.

  3. May signal broader investor hesitation around large-scale infrastructure projects needed for AI growth.

Database startup ClickHouse, seen as a rival to Snowflake, aims to raise funds at a $6 billion valuation, triple its value from four years ago, led by Khosla Ventures. Its software, initially developed at Yandex, excels in real-time analytics for AI developers. With roughly $70 million annual revenue, its valuation reflects high investor expectations for database tools as AI agents gain popularity, competing against Snowflake, Elastic, and Datadog.

Why This Matters

  1. It underscores growing investor interest in database technology supporting AI applications.

  2. Highlights demand for efficient, real-time data processing tools crucial for emerging AI agents.

  3. Reflects intensifying competition among database software providers catering specifically to AI-driven analytics.

🧠RESEARCH

This survey reviews the evolution of large multimodal reasoning models (LMRMs), which integrate text, images, audio, and video to enhance reasoning. It traces the shift from modular pipelines to unified, language-centric frameworks, highlighting new methods like Multimodal Chain-of-Thought (MCoT). The paper discusses challenges in generalization, reasoning depth, and adaptive planning for real-world tasks.

General-Level is a new framework for evaluating multimodal AI models, measuring their ability to both understand and generate diverse content (text, images, etc.). It includes General-Bench, a large test set with 700 tasks, highlighting gaps toward human-like intelligence and guiding future advances toward truly general-purpose AI systems.

StreamBridge is a method for upgrading offline video-language models into real-time assistants. It adds memory for better multi-step interactions and proactive responses. Supported by a custom dataset (Stream-IT), StreamBridge notably enhances streaming performance, surpassing advanced models like GPT-4o and Gemini 1.5 Pro in real-time video tasks.

X-Reasoner is a vision-language model designed for generalizable reasoning across modalities and domains. Trained initially on general text data, it later incorporates reinforcement learning for improved reasoning transfer. X-Reasoner excels in both general and medical benchmarks, outperforming existing models. A specialized version, X-Reasoner-Med, sets new standards in medical reasoning tasks.

LiftFeat is a lightweight network for robust local feature matching in challenging conditions like poor lighting or repetitive patterns. It combines 3D geometric features with 2D descriptors using a pre-trained depth model. LiftFeat significantly improves accuracy in tasks like pose estimation and visual localization, outperforming other lightweight methods.

🛠️TOP TOOLS

PlayPhrase - Allows users to search for and play video clips containing specific phrases from movies and TV shows. 

Star By Face - AI-powered celebrity look-alike app that allows users to discover which famous personalities they resemble. 

AI Time Machine - Transform celebrities into historical figures, offering a humorous and imaginative glimpse into the past.

Ask Your PDF - AI-powered platform that revolutionizes document interaction, allowing users to engage in intelligent conversations with their PDF files, extract key insights, and manage information efficiently across multiple platforms.

NetworkAI - AI-powered networking tool developed by Wonsulting to streamline and enhance the job search process. 

📲SOCIAL MEDIA

🗞️MORE NEWS

  • Microsoft and OpenAI are renegotiating their partnership, focusing on Microsoft's equity stake and future tech access. Talks are tense due to OpenAI's restructuring plans, expanding business ambitions, and rising competition between the two companies.

  • ByteDance released Agent TARS, an open-source AI tool that automates tasks by visually reading web content and interacting with files. Currently experimental and macOS-only, it provides live feedback and lets users intervene during tasks.

  • Zencoder launched Zen Agents, an AI platform enabling teams to create and share specialized coding tools. It automates team workflows, reduces delays, and provides an open-source marketplace for community-built agents, helping developers collaborate efficiently.

  • SoundCloud clarified it hasn't used user-uploaded music to train AI models, despite updating its terms allowing future AI use. If it ever chooses to use user content, SoundCloud promises clear communication and opt-out options for creators.

  • Alibaba's Qwen introduced "Web Dev," an AI tool creating complete front-end code from simple user prompts. Using Qwen3 models, it quickly generates websites and interactive features, positioning Alibaba against similar products from OpenAI and Anthropic.

What'd you think of today's edition?

Login or Subscribe to participate in polls.

Reply

or to participate.