NATURAL 20
Posts
Claude's Revolutionary Computer Use

Claude's Revolutionary Computer Use

PLUS:CrewAI Powers Enterprise AI Agents, Runway Launches Act-One Motion CaptureAnd more.

Wes Roth
October 23, 2024

In partnership with

SUBSCRIBE | JOIN AI FORUM | LEARN AI

Your Digital Twin, Proxy

Your personal digital clone for low value tasks
Gets smarter as you give it commands to learn
The first truly general AI Agent

Meet your Proxy here

Today:

Claude's Revolutionary Computer Use
Genmo Unveils Open Source Mochi
Stability AI's Latest Image Model
Runway Launches Act-One Motion Capture
ChatGPT Voice Mode Now in Europe
CrewAI Powers Enterprise AI Agents
Ideogram Unveils Infinite Canvas Tool

Claude's Revolutionary Computer Use

Anthropic introduced two new models, Claude 3.5 Sonnet and Claude 3.5 Haiku, alongside a groundbreaking feature: computer use in public beta. The upgraded Claude 3.5 Sonnet excels in coding tasks, outpacing competitors in key benchmarks. Claude 3.5 Haiku combines speed, affordability, and improved tool use. Both models are accessible via API and major cloud services.

The new computer use feature allows Claude to interact with computers like humans, though it remains experimental. Leading companies are already testing its potential in complex tasks. Anthropic emphasizes safe deployment and invites feedback for further improvements.

Genmo Unveils Open Source Mochi

Genmo has launched Mochi 1, an open-source AI video generation model that rivals top proprietary tools like Runway’s Gen-3 Alpha. Mochi 1, available under the Apache 2.0 license, allows users to create high-quality videos from text prompts for free, while other models charge up to $94.99 per month. It excels in motion quality and prompt adherence, with a higher-definition version, Mochi 1 HD, coming soon.

Genmo aims to democratize AI video technology and has raised $28.4 million in funding. The company is hiring as it pushes innovation in video generation and AI's broader role in robotics and automation.

Stability AI's Latest Image Model

Stability AI has introduced Stable Diffusion 3.5, its most advanced image model to date, available in several variants: Large, Large Turbo, and Medium (coming October 29). These models are customizable, run on consumer hardware, and are free for both non-commercial and commercial use under certain conditions. The models excel in prompt adherence, image quality, and offer diverse outputs, including various artistic styles.

Stability AI emphasizes safety in development and usage, with licenses allowing broad creative and commercial applications. Users can download the models from platforms like Hugging Face and access them through APIs and cloud platforms.

Runway Launches Act-One Motion Capture

Runway has launched a new AI tool called “Act-One,” which allows users to capture realistic facial expressions and transfer them to AI-generated characters. This tool simplifies the traditionally complex process of facial animation, making it more accessible without the need for expensive motion capture equipment.

Act-One is available to users with credits on Runway’s Gen-3 Alpha video generation model. It enables creators to animate multiple characters using a single video input, making it ideal for indie filmmakers, animators, and game developers. Act-One also includes safeguards to prevent misuse and protect public figure impersonation.

ChatGPT Voice Mode Now in Europe

ChatGPT's Advanced Voice mode is now available across Europe, enabling users to interact with the AI by speaking and receiving responses in one of its nine voices. This feature, previously limited to the US and UK, allows for more natural, human-like conversations, letting users interrupt and adjust responses as needed.

The rollout was announced casually via a tweet from OpenAI, and the feature is accessible to ChatGPT Plus subscribers. The availability of Advanced Voice mode in Europe marks a significant update, allowing more users to access this conversational AI without the need for a VPN.

CrewAI Powers Enterprise AI Agents

CrewAI, a startup known for its AI agent framework, has launched its first product, CrewAI Enterprise, which helps businesses build and deploy multi-agent systems. This platform enables organizations to create fleets of AI agents using any large language model (LLM) or cloud platform, simplifying complex tasks such as internal automations, marketing, coding, and legal analysis.

CrewAI's agents can autonomously perform tasks and iterate on results. The platform's open-source nature has driven rapid growth, with over 10 million agents executed monthly and significant adoption among Fortune 500 companies. CrewAI recently secured $18 million in funding to fuel further expansion.

Ideogram Unveils Infinite Canvas Tool

Canadian AI startup Ideogram, founded by former Google Brain researchers, has launched its "Canvas" feature for manipulating and combining AI-generated images. This infinite workspace allows users to spread, resize, and merge images, as well as upload their own visuals.

Alongside Canvas, Ideogram introduces "Magic Fill" for editing specific image regions and "Extend" for expanding images while maintaining consistency. The platform offers various subscription tiers, including free and paid options, with additional perks like priority credits and unlimited canvases. Ideogram credits its community for feedback and is expanding its team in Toronto and New York.

🧠RESEARCH

CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution

CompassJudger-1 is an open-source judge model designed to improve the evaluation of large language models (LLMs). It can perform scoring, compare models, generate critiques, and handle diverse tasks. The JudgerBench benchmark was also introduced for standardized subjective evaluations. These tools aim to enhance LLM assessment and collaboration.

PUMA: Empowering Unified MLLM with Multi-granular Visual Generation

PUMA is a unified multimodal large language model (MLLM) designed for diverse visual generation tasks, addressing varying granularity needs. It integrates multi-granular visual features for both inputs and outputs, excelling in tasks like text-to-image generation and image manipulation. PUMA advances vision-language understanding, with its code available for public use.

SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree

SAM2Long enhances SAM 2 for long video object segmentation by using a training-free memory tree approach to reduce error accumulation over time. It selects the best segmentation paths frame by frame, improving accuracy in complex videos with occlusions. SAM2Long shows significant performance gains, with code available for public use.

Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages

Pangea is a multilingual multimodal large language model (MLLM) designed for 39 languages, addressing the lack of representation in non-English contexts. It uses a diverse dataset, PangeaIns, and an evaluation suite, PangeaBench, to improve cross-cultural and multilingual tasks. Pangea significantly outperforms existing models and is fully open-sourced for inclusivity

AutoTrain: No-code training for state-of-the-art models

AutoTrain is an open-source, no-code tool designed to simplify training or fine-tuning state-of-the-art models for various tasks, such as text classification, language models, image classification, and more. It supports local or cloud use and integrates with models from Hugging Face Hub, making model customization accessible without coding.

🛠️TOP TOOLS

OpenHome - Your Custom AI Voice Interface

Averi - AI marketing management platform and built-in vetted expert ecosystem

AIVA - Your personal AI music generation assistant

SagaLabs - AI-powered story localization: Translate, share & earn worldwide

Sessions - AI-powered platform that makes meetings & webinars insanely productive.

📲SOCIAL MEDIA

We've built an API that allows Claude to perceive and interact with computer interfaces.
This API enables Claude to translate prompts into computer commands. Developers can use it to automate repetitive tasks, conduct testing and QA, and perform open-ended research.
— Anthropic (@AnthropicAI)
3:06 PM • Oct 22, 2024

🗞️MORE NEWS

Microsoft and OpenAI are offering $10 million in grants to select news outlets like The Seattle Times and The Minnesota Star Tribune to explore AI tools in journalism. The grants will fund AI-driven initiatives such as content analysis, transcription, and archive search tools, aiming to enhance local media.
Qualcomm launched its new Snapdragon 8 Elite chip with onboard AI capabilities, allowing real-time video enhancements and object recognition without internet access. These features aim to boost smartphone sales by offering unique AI-driven functionalities.
Alcon Entertainment is suing Elon Musk’s Tesla and Warner Bros. Discovery for using AI-generated images resembling "Blade Runner 2049" in promotional materials without permission. Alcon seeks damages and a court order to stop further use.
Asana has introduced AI Studio, enabling users to create custom AI agents to manage workflows without code. This feature streamlines tasks like project coordination, reducing "busy work." AI Studio leverages Asana’s Work Graph for efficient cross-functional collaboration, offering a specialized AI integration for enterprises.
OpenAI has appointed Dr. Ronnie Chatterji as its first Chief Economist. He will lead research on AI's economic impacts, focusing on growth, job creation, and long-term labor trends. Dr. Chatterji aims to ensure AI benefits are widely shared and informs policies that support economic development.

What'd you think of today's edition?

Learn AI with us.

Let’s Build the Future Together.

Hello fellow AI-obsessed traveler,

Over the past 2 years, as we’ve grown to over 250,000 subscribers between the YouTube Channel and this newsletter, we've received an overwhelming number of requests for one specific thing.

While the newsletter helps keep you up to speed with AI news, many of you have asked for the next step: to learn how to actually apply AI in your work.

Today we’re finally announcing the solution with NATURAL 20, the community for like-minded AI learners. As a loyal newsletter reader you are getting access at the lowest price it will ever be:

JOIN NATURAL 20 AI UNIVERSITY TODAY

What you get:

* Tutorials by experts across various AI fields.

* Daily tutorials by Wes Roth about the latest use cases.

* Building Autonomous AI Agents to Automate Your Life and Business (NEW!)

* A network of the top 1% of early AI adopters.

* Access to community-only resources and software.

* And many more features rolling out soon.

Reply

or to participate.