Ilya Sutskever’s New AI company

PLUS: Microsoft Unveils Florence-2, Meta Releases New AI Models and more.

In partnership with

Learn AI Strategies worth a Million Dollar in this 3 hour AI Workshop. Join now for $0

Everyone tells you to learn AI but no one tells you where.

We have partnered with GrowthSchool to bring this ChatGTP & AI Workshop to our readers. It is usually $199, but free for you because you are our loyal readers 🎁

This workshop has been taken by 1 Million people across the globe, who have been able to:

  • Build business that make $10,000 by just using AI tools

  • Make quick & smarter decisions using AI-led data insights

  • Write emails, content & more in seconds using AI

  • Solve complex problems, research 10x faster & save 16 hours every week

You’ll wish you knew about this FREE AI Training sooner (Btw, it’s rated at 9.8/10 ⭐)

Today:

  • Ilya Sutskever’s New AI company

  • ElevenLabs Launches Creator Tool

  • SoftBank Partners with Perplexity

  • Anthropic Unveils Claude 3.5 Sonnet 

  • Runway Launches Gen-3 Alpha

  • Microsoft Unveils Florence-2

  • Roblox Advances to 4D AI

  • Meta Releases New AI Models

  • DeepMind’s AI Creates Soundtracks

OpenAI’s former chief scientist is starting a new AI company

Ilya Sutskever, co-founder and former chief scientist of OpenAI, is launching Safe Superintelligence Inc. (SSI), a new AI company focused on safety. SSI aims to develop a safe and powerful AI system, prioritizing safety over commercial pressures. The startup is co-founded by Daniel Gross and Daniel Levy, both experienced in the AI field. 

SSI intends to avoid the management and product cycle distractions faced by other AI companies like OpenAI, Google, and Microsoft. Sutskever emphasizes that SSI will not pursue other projects until they achieve their goal of creating safe superintelligence.

ElevenLabs unveils open-source creator tool for adding sound effects to videos

ElevenLabs has launched an open-source tool that allows creators to add sound effects to videos. This new application analyzes imported video clips and generates multiple sound effect options in about 15 seconds. The tool works by extracting frames from the video, creating a custom text-to-sound effects prompt, and combining the video and audio into a single file. 

The tool, available on GitHub, aims to streamline the workflow for AI video creators by intelligently suggesting sound effects. ElevenLabs envisions this API being used in dynamic experiences like immersive video games and other creative projects.

SoftBank Corp. Launches Strategic Partnership with Leading AI Startup Perplexity

SoftBank Corp. announced a strategic partnership with AI startup Perplexity, offering a one-year free trial of Perplexity Pro, a premium AI answer engine. Starting June 19, 2024, customers using SoftBank's mobile services ('SoftBank,' 'Y!mobile,' and 'LINEMO') can access this service. 

Perplexity Pro provides highly accurate answers and displays sources, utilizing advanced large language models. It includes features like image generation, unlimited file uploads, and personalized search results. This initiative aims to enhance user experience with cutting-edge AI capabilities, making SoftBank's services more attractive to its customers.

Anthropic’s Claude 3.5 Sonnet outperforms OpenAI and Google in enterprise AI race

Anthropic has launched Claude 3.5 Sonnet, a high-performing AI model designed for enterprise use, surpassing competitors like OpenAI and Google. It offers unmatched capabilities at a lower cost, excelling in key performance metrics. Co-founder Daniela Amodei highlighted the model's focus on quality, safety, reliability, speed, and cost, driven by enterprise feedback. Unique features include Artifacts, a tool for team collaboration on projects. 

Unlike competitors, Claude 3.5 Sonnet does not prioritize speech input/output based on customer needs. This release marks a significant step in Anthropic’s strategy to meet enterprise demands and push AI boundaries.

Introducing Gen-3 Alpha

Runway has introduced Gen-3 Alpha, an advanced AI model for high-fidelity video generation. This model, trained on a new large-scale multimodal infrastructure, surpasses its predecessor, Gen-2, in consistency, motion, and detail. Gen-3 Alpha supports various tools, including Text to Video, Image to Video, and Director Mode, and offers precise control over video elements. It also includes new safeguards like visual moderation and provenance standards. 

Gen-3 Alpha enables artists to create expressive human characters and dynamic scenes, offering vast storytelling possibilities. This model is designed for collaboration and customization, catering to specific artistic and narrative needs.

Microsoft drops Florence-2, a unified model to handle a variety of vision tasks

Microsoft’s Azure AI team has released Florence-2, a new vision foundation model available on Hugging Face under the MIT license. Florence-2 can handle various vision tasks, such as captioning, object detection, visual grounding, and segmentation, using a unified prompt-based representation. 

The model comes in two sizes, 232M and 771M parameters, and performs on par or better than many larger models. It was trained on a large visual dataset (FLD-5B) with 5.4 billion annotations for 126 million images. Florence-2 aims to provide a single solution for diverse vision applications, reducing the need for multiple task-specific models and cutting compute costs.

Roblox’s Road to 4D Generative AI

Roblox is advancing towards 4D generative AI, enhancing dynamic interactions in addition to 3D object creation. This new AI system will integrate appearance, shape, physics, and scripts, enabling more immersive and interactive experiences. Current AI tools, like Assistant and Animation Capture, simplify the creation process for Roblox's 77 million daily users. 

The ultimate goal is to create functional and interactive assets that automatically adapt to their environment, such as avatars with realistic movements and vehicles with true-to-life physics. Roblox aims to address challenges in functionality, interaction, and controllability to achieve seamless, user-friendly creation.

Releasing New AI Research Models to Accelerate Innovation at Scale

Meta’s Fundamental AI Research (FAIR) team has released several new AI models to boost innovation. These include image-to-text, text-to-music, multi-token prediction, and AI-generated speech detection models.

The Chameleon model can simultaneously understand and generate text and images. Multi-token prediction improves the efficiency of large language models by predicting multiple words at once. JASCO offers enhanced control over AI-generated music by using various inputs like chords. AudioSeal detects AI-generated speech segments quickly and efficiently. 

Additionally, Meta is promoting diversity in text-to-image generation by evaluating and improving geographical and cultural representation in AI-generated images.

Google DeepMind’s new AI tool uses video pixels and text prompts to generate soundtracks

Google DeepMind has introduced a new AI tool that generates soundtracks for videos by combining video pixels and text prompts. This tool can automatically create audio that matches the scenes in a video, such as sound effects, dialogue, and music. Users can generate an unlimited number of soundtracks, with the option to include text prompts. 

The tool is trained on video, audio, and annotations, enabling it to match audio events with visual scenes. Although it still faces challenges, like synchronizing lip movements with dialogue, it promises significant advancements in automated audio creation. The tool will include Google’s SynthID watermark to indicate AI-generated content.

🧠RESEARCH

The paper introduces MMDU, a benchmark and dataset for evaluating and improving multi-turn, multi-image dialog understanding in Large Vision-Language Models (LVLMs). It highlights that current LVLMs struggle in real-world conversations and shows that fine-tuning on MMDU-45k significantly enhances performance, bridging the gap with real-world application needs.

The paper presents mDPO, a method to improve multimodal large language models (LLMs) by optimizing preferences for both language and images. It addresses the issue of models ignoring images in multimodal scenarios. Experiments show mDPO reduces hallucination and enhances performance across benchmarks, demonstrating its effectiveness in multimodal preference optimization.

DeepSeek-Coder-V2 is an open-source code language model that rivals GPT4-Turbo in code-specific tasks. Enhanced with 6 trillion additional tokens, it improves coding and mathematical reasoning, supports 338 programming languages, and extends context length to 128K. It surpasses closed-source models in coding and math benchmarks, demonstrating significant advancements.

The paper introduces DICE, a method for improving large language models (LLMs) using implicit rewards from direct preference optimization (DPO). By bootstrapping current LLM rewards to create a preference dataset, DICE refines response quality and reduces biases. This approach significantly enhances alignment and outperforms larger models, achieving notable results with only 8B parameters.

The paper introduces a new framework for improving 360-degree monocular depth estimation using perspective distillation and unlabeled data augmentation. By leveraging state-of-the-art models and generating pseudo labels through a six-face cube projection technique, it enhances depth accuracy in 360-degree images. Tested on Matterport3D and Stanford2D3D datasets, the method shows significant improvements, especially in zero-shot scenarios.

🛠️TOP TOOLS

My AskAI - AI customer support for your SaaS

Melody Agents - Choose a topic, a music genre and wait for the agents to generate a song

Kona - A leadership coach for every remote manager.

Stable Diffusion 3 Medium - Multimodal Diffusion Transformer text-to-image model 

Gradio - Fastest way to demo your machine learning model with a friendly web interface

📲SOCIAL MEDIA

What'd you think of today's edition?

Login or Subscribe to participate in polls.

Learn AI with us.

Let’s Build the Future Together.

Hello fellow AI-obsessed traveler,

Over the past 2 years, as we’ve grown to over 250,000 subscribers between the YouTube Channel and this newsletter, we've received an overwhelming number of requests for one specific thing.

While the newsletter helps keep you up to speed with AI news, many of you have asked for the next step: to learn how to actually apply AI in your work.

Today we’re finally announcing the solution with NATURAL 20, the community for like-minded AI learners. As a loyal newsletter reader you are getting access at the lowest price it will ever be:

 JOIN NATURAL 20 AI UNIVERSITY TODAY

What you get:

* Tutorials by experts across various AI fields.

* Daily tutorials by Wes Roth about the latest use cases.

* Building Autonomous AI Agents to Automate Your Life and Business (NEW!)

* A network of the top 1% of early AI adopters.

* Access to community-only resources and software.

* And many more features rolling out soon.

Reply

or to participate.