NATURAL 20
Posts
OpenAI Unveils Operator for Web

OpenAI Unveils Operator for Web

PLUS: UI-TARS Outperforms GPT-4o, Claude, Perplexity Assistant Now Available on Android and more.

Wes Roth
January 24, 2025

In partnership with

SUBSCRIBE | AI TOOLS | LEARN AI

Need a personal assistant? We do too, that’s why we use AI.

Ready to embrace a new era of task delegation?

HubSpot’s highly anticipated AI Task Delegation Playbook is your key to supercharging your productivity and saving precious time.

Learn how to integrate AI into your own processes, allowing you to optimize your time and resources, while maximizing your output with ease.

Get the free guide here.

Today:

OpenAI Unveils Operator for Web
Humanity’s Last Exam Stumps AI
Stargate Receives Massive $38B Backing
UI-TARS Outperforms GPT-4o, Claude
Perplexity Assistant Now Available on Android
ChatGPT Restored After Global Outage

OpenAI Unveils Operator for Web

OpenAI has launched Operator, a browser-based AI agent available as a research preview for Pro users in the U.S. Powered by the new Computer-Using Agent (CUA) model, Operator can perform tasks like booking trips, filling forms, and ordering groceries by interacting with web interfaces. With safety measures like takeover mode and user confirmations, it ensures secure usage. Operator aims to refine AI's capabilities through user feedback while expanding access and enhancing workflows.

Humanity’s Last Exam Stumps AI

A new test called "Humanity’s Last Exam" challenges AI systems with 3,000 tough questions across diverse subjects like philosophy and engineering. Developed by experts, the test aims to measure AI’s intellectual abilities beyond traditional benchmarks. Although top models from companies like OpenAI and Google failed, the test’s creators anticipate rapid improvements. This new evaluation reflects concerns that AI may soon surpass human experts in general knowledge, requiring new methods to assess its impact.

Stargate Receives Massive $38B Backing

OpenAI and SoftBank are reportedly investing $19 billion each in Stargate, a joint venture to develop AI-focused data centers across the U.S. The initiative, which also includes Middle Eastern AI fund MGX, aims to channel $500 billion into infrastructure, with $100 billion already pledged for initial projects like a facility in Texas. Despite skepticism from Elon Musk, OpenAI's Sam Altman dismissed claims of insufficient funds as incorrect.

UI-TARS Outperforms GPT-4o, Claude

ByteDance’s new AI agent, UI-TARS, surpasses rivals like GPT-4o and Claude with its ability to autonomously navigate graphical user interfaces on desktop, mobile, and web applications. Equipped with advanced perception, reasoning, and error correction, it excels in benchmarks like VisualWebBench and AndroidWorld. Trained on GUI-focused data, UI-TARS delivers state-of-the-art performance in complex workflows, offering step-by-step explanations and adaptive learning, making it a powerful contender in the competitive AI agent landscape.

Perplexity Assistant Now Available on Android

AI-powered search engine Perplexity has launched "Perplexity Assistant" for Android, enabling users to perform multi-app actions like booking rides or setting calendar events. The assistant leverages web access and multimodal inputs, including camera usage, for contextual and automated tasks. Initially free in 15 languages, it has faced functionality issues in previous features. As Perplexity grows rapidly, legal disputes with publishers over content use remain a challenge despite its revenue-sharing program.

ChatGPT Restored After Global Outage

ChatGPT experienced a global outage, leaving users unable to access the AI tool due to a "bad gateway error." The issue, reported by over 10,000 users on Downdetector, began around 11:00 GMT and was resolved by OpenAI at 15:09 GMT. While the cause remains undisclosed, the downtime disrupted users worldwide. ChatGPT, a widely popular tool with over 300 million weekly users, continues to drive interest in generative AI.

🧠RESEARCH

Humanity's Last Exam

Humanity's Last Exam (HLE) introduces a challenging multi-modal benchmark to measure advanced AI capabilities across 3,000 questions from 100+ subjects. Despite rapid AI advancements, current models show low accuracy, highlighting room for improvement. HLE fosters informed discussions on AI progress, risks, and governance, offering a critical tool for future assessment.

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek-R1 introduces advanced reasoning models trained via reinforcement learning (RL). DeepSeek-R1-Zero, developed without supervised fine-tuning, showcases impressive reasoning but struggles with readability and language consistency. The improved DeepSeek-R1 incorporates multi-stage training and cold-start data, achieving competitive performance. Open-sourced models, including six distilled versions, aim to advance reasoning research in the AI community.

FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces

FilmAgent introduces a multi-agent framework for automating virtual 3D film production, simulating roles like directors, screenwriters, and cinematographers. It transforms ideas into structured scripts, plans cinematography, and refines outputs through agent collaboration. Human evaluation shows FilmAgent outperforms baselines, highlighting its effectiveness in automating complex filmmaking processes using coordinated multi-agent systems.

Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback

Test-Time Preference Optimization (TPO) is a framework that aligns large language model (LLM) outputs with human preferences during inference, eliminating retraining. By iteratively refining responses using textual critiques as rewards, TPO enhances alignment in tasks like instruction following and safety. Lightweight and efficient, it achieves alignment on the fly, outperforming pre-aligned models.

VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding

VideoLLaMA3 introduces a vision-centric multimodal foundation model for image and video understanding. By emphasizing high-quality image-text data over massive video-text datasets, it employs four training stages to refine vision-language alignment, pretraining, and fine-tuning. This approach enhances precision, achieving state-of-the-art performance in image and video understanding benchmarks.

🛠️TOP TOOLS

Morph Studio AI - AI-powered video creation platform that transforms ideas into professional-quality videos through innovative features.

Shakker AI - AI image generation platform that combines advanced Stable Diffusion models with powerful editing tools to revolutionize digital content creation.

Drippi AI - AI-powered Twitter outreach assistant designed to automate and enhance direct messaging campaigns.

FaceVary - AI-powered face swapping tool that allows users to effortlessly replace faces in images and videos.

Synthesys X - Instantly Create Viral AI Videos, AI Talking Consistent Characters & Unique Images

📲SOCIAL MEDIA

Grok 3
— Elon Musk (@elonmusk)
12:34 PM • Jan 23, 2025

🗞️MORE NEWS

Google's AI assistant, Gemini, now enables multi-app tasks in a single prompt, integrates with Samsung's Galaxy S25 series, and introduces new features like photography feedback, live video streaming, and Circle to Search improvements.
Galileo's Agentic Evaluations ensures AI agents perform reliably by detecting errors, monitoring tool usage, and tracking task success. Backed by $68M funding, it addresses AI safety concerns for enterprises adopting large-scale AI deployments.
Microsoft plans to train 1 million South Africans in AI and cybersecurity by 2026 through its national skilling initiative, aiming to enhance global competitiveness and expand opportunities across sectors, particularly for youth.
GhostGPT, an uncensored AI chatbot marketed to cybercriminals, enables malware creation, phishing scams, and business email compromises. Accessible via Telegram, it lacks ethical safeguards, lowering entry barriers for inexperienced attackers, raising cybersecurity concerns.
Researchers from NYU, MIT, and Google DeepMind improved AI-generated images without retraining models by using verifiers and search algorithms. Their method enhances image quality during generation, balancing computing efficiency with artistic and realistic outputs.
A software engineer bought the domain "OGOpenAI.com" for a small price and redirected it to DeepSeek, a Chinese AI lab. DeepSeek, known for its open-source AI models, contrasts with OpenAI's shift away from open releases, sparking industry discussion.

What'd you think of today's edition?

Reply

or to participate.