NATURAL 20
Posts
ElevenLabs Debuts AI voice isolator

ElevenLabs Debuts AI voice isolator

PLUS: Perplexity’s Pro Search AI upgraded, ChatGPT Mac Security Flaw and more.

Wes Roth
July 05, 2024

In partnership with

SUBSCRIBE | JOIN AI FORUM | LEARN AI

Learn AI Strategies worth a Million Dollar in this 3 hour AI Workshop. Join now for $0

Everyone tells you to learn AI but no one tells you where.

We have partnered with GrowthSchool to bring this ChatGTP & AI Workshop to our readers. It is usually $199, but free for you because you are our loyal readers 🎁

This workshop has been taken by 1 Million people across the globe, who have been able to:

Build business that make $10,000 by just using AI tools
Make quick & smarter decisions using AI-led data insights
Write emails, content & more in seconds using AI
Solve complex problems, research 10x faster & save 16 hours every week

You’ll wish you knew about this FREE AI Training sooner (Btw, it’s rated at 9.8/10 ⭐)

Save your seat for $0 now! (Valid for 100 people only)

Today:

ElevenLabs Debuts AI voice isolator
Perplexity’s Pro Search AI upgraded
Kyutai Launches Moshi
Meta Unveils 3DGen
ChatGPT Mac Security Flaw
Nvidia's $12B China Sales

ElevenLabs launches free AI voice isolator to take on Adobe

ElevenLabs, an AI voice startup, launched a free AI Voice Isolator tool that removes unwanted background noise from content like films, podcasts, and videos. The tool processes uploaded audio files to extract clear speech, similar to studio-quality recordings. Despite its capabilities, it struggles with some irregular noises and doesn't work on music vocals.

The tool, currently available on the ElevenLabs platform, offers free usage with limits and paid plans starting at $5 per month for larger files. Future improvements and API access are anticipated, though details on underlying models and data usage remain limited.

VENTUREBEAT

Perplexity’s ‘Pro Search’ AI upgrade makes it better at math and research

Perplexity has upgraded its Pro Search AI tool, enhancing its ability to handle complex queries, improve math and programming capabilities, and conduct thorough research. The tool can now break down multi-step questions and provide detailed answers, as demonstrated with queries about the northern lights in Iceland and Finland.

However, Perplexity faces criticism for alleged plagiarism and unethical data scraping practices. The upgraded tool is available for free with limited daily searches, while a subscription plan offers more extensive use. Despite the controversy, Perplexity aims to improve content generation and data analysis for users.

THE VERGE

Kyutai Open Sources Moshi: A Real-Time Native Multimodal Foundation AI Model that can Listen and Speak

Kyutai has launched Moshi, an open-source, real-time multimodal AI model capable of listening and speaking simultaneously. Moshi can understand and express emotions, handle multiple accents, and process both text and audio in real-time. It features advanced training on synthetic conversations and offers an impressive end-to-end latency of 200 milliseconds.

Accessible on devices like MacBooks, Moshi includes responsible AI use features such as watermarking for detecting AI-generated audio. Kyutai plans to release more versions and a technical report, aiming to foster widespread adoption and innovation through its permissive licensing.

MARKTECH POST

Meta 3D Gen

Meta has unveiled Meta 3D Gen (3DGen), a cutting-edge text-to-3D asset generation tool that creates high-quality 3D shapes and textures in under a minute. 3DGen excels in generating detailed 3D assets with prompt fidelity and supports physically-based rendering (PBR) for real-world applications. It integrates Meta 3D AssetGen for text-to-3D generation and Meta 3D TextureGen for text-to-texture generation, representing objects in view space, volumetric space, and UV space.

The model achieves a win rate of 68% over single-stage models and outperforms industry standards in prompt fidelity and visual quality, making it a significant advancement in 3D asset creation.

OpenAI’s ChatGPT Mac app was storing conversations in plain text

OpenAI's ChatGPT macOS app had a security flaw where conversations were stored in plain text, making them easily accessible to bad actors. The issue, discovered by Pedro José Pereira Vieito, allowed other apps to read these chats. After The Verge contacted OpenAI, the company released an update encrypting the stored chats.

OpenAI assured users of their commitment to security and user experience. The flaw was due to the app not using sandbox protections as it isn't distributed via the Mac App Store. The update resolved the issue, preventing unauthorized access to the conversations.

THE VERGE

Nvidia to make $12bn from AI chips in China this year despite US controls

Nvidia is projected to earn $12 billion from AI chip sales in China this year, despite US export restrictions. The company plans to deliver over 1 million of its new H20 chips, priced between $12,000 and $13,000 each. These chips are designed to comply with US regulations, allowing sales to Chinese customers. This surpasses sales of local rival Huawei's Ascend 910B.

Although Nvidia's overall business in China has decreased due to export controls, the H20 chip's popularity is bolstering sales, highlighting Nvidia's ability to navigate geopolitical tensions while meeting significant demand in China.

FINANCIAL TIMES

🧠RESEARCH

Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems

Researchers designed the "Summary of a Haystack" (SummHay) task to evaluate large language models (LLMs) and retrieval-augmented generation (RAG) systems. This task requires summarizing large document sets to identify key insights and accurately cite sources. Current systems struggle with SummHay, scoring significantly lower than human performance, highlighting ongoing challenges in long-context tasks.

We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?

The WE-MATH benchmark evaluates large multimodal models' (LMMs) mathematical reasoning abilities. Unlike existing benchmarks, it focuses on problem-solving principles, not just results. WE-MATH includes 6.5K visual math problems and assesses models using a novel four-dimensional metric. Findings show a shift in GPT-4o's challenges from insufficient knowledge to inadequate generalization, highlighting progress and areas for improvement in LMMs' reasoning capabilities.

ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning

The ROS-LLM framework simplifies robot programming for non-experts using natural language prompts and Robot Operating System (ROS) context. It integrates large language models (LLMs) to understand tasks and execute ROS actions. Key features include behavior extraction, support for various behavior modes, and imitation learning. Experiments show its robustness and versatility, and the code is open-source.

Scaling Synthetic Data Creation with 1,000,000,000 Personas

Researchers introduced a persona-driven data synthesis method using a large language model (LLM) to create diverse synthetic data. They developed Persona Hub, a collection of 1 billion personas, representing 13% of the global population. These personas enable the creation of varied synthetic data for multiple applications, including mathematical problems and game NPCs, showcasing scalability, versatility, and potential impact on LLM research.

OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation

Researchers introduced OpenVid-1M, a high-quality dataset with over 1 million text-video pairs for text-to-video (T2V) generation, addressing the lack of precise, open-source data. They also created OpenVidHD-0.4M with 433K 1080p videos. They proposed a Multi-modal Video Diffusion Transformer (MVDiT) to better extract semantic information from text, demonstrating superior performance over existing methods.

🛠️TOP TOOLS

Moshi - An experimental conversational AI.

Iconic Voices by Eleven Labs - Listen to your favorite books and articles voiced by Judy Garland, James Dean, Burt Reynolds and Sir Laurence Olivier

Pro Search in Perplexity AI - Upgraded for more advanced problem-solving

DeepMake - Gives you control of the most powerful Open Source AI tools, allowing you to take your visual content to the next level.

Gen-3 Alpha by Runway - A new frontier for high-fidelity, controllable video generation.

RunDiffusion - Fully Managed Open Source Ai Tools

📲SOCIAL MEDIA

Gen-3 Alpha Text to Video is now available to everyone.
A new frontier for high-fidelity, fast and controllable video generation.
Try it now at runwayml.com
— Runway (@runwayml)
5:03 PM • Jul 1, 2024

🗞️MORE NEWS

This is Google AI, and it's coming to the Pixel 9

Google is set to launch the Pixel 9 with advanced AI features under "Google AI." New features include "Add Me" for group photos, "Studio" for creative edits, and "Pixel Screenshots" for enhanced screenshot searching. These tools focus on improving privacy and user experience, aiming to differentiate from Microsoft's controversial Recall feature. ANDROID AUTHORITY

WhatsApp is developing a personalized AI avatar generator

WhatsApp is developing a feature called Imagine AI, which allows users to create personalized avatars using their photos and text prompts. The avatars can depict users in various settings, from forests to outer space. This feature, currently in beta, leverages Meta’s AI Llama model and requires user consent to activate and manage reference images. THE VERGE

SoftBank’s $10 Billion-Plus Plan to Get Into the AI Race Centers on Power and Chips

SoftBank CEO Masayoshi Son is planning a $10 billion investment in AI, focusing on energy projects and acquiring Nvidia GPUs, crucial for AI development. To fund this, SoftBank is in talks with banks to borrow money. This move underscores SoftBank's ambition to lead in the rapidly growing AI sector. THE INFORMATION

Elon Musk: Grok 2 AI Arrives in August

Elon Musk announced that Grok 2, the next version of X's AI tool, will launch in August 2024, aiming to surpass current AI capabilities. Grok 3 is expected by the end of the year, leveraging Nvidia's H100 GPUs. Despite initial reliability issues, Musk's xAI startup is also building a supercomputer in Tennessee to support Grok's development. PCMAG

Apple’s Devices Are Lasting Longer, Making AI Strategy Even More Critical

Apple's strategy focuses on extending device longevity, making AI and software more crucial. Key developments include AI integration in Vision Pro, ongoing iOS 19 development, and AI-based improvements for Siri. Notably, Apple is not pursuing an AI deal with Meta. These efforts highlight the importance of AI as Apple's devices last longer. BLOOMBERG

Apple’s Phil Schiller is reportedly joining OpenAI’s board

Apple's Phil Schiller, head of the App Store, is joining OpenAI's nonprofit board in a non-voting observer role. This move allows Schiller to gain insights into OpenAI as Apple plans to integrate ChatGPT into iOS and macOS. This partnership may lead to Apple earning a share of ChatGPT subscriptions made through its platforms. THE VERGE

What'd you think of today's edition?

Learn AI with us.

Let’s Build the Future Together.

Hello fellow AI-obsessed traveler,

Over the past 2 years, as we’ve grown to over 250,000 subscribers between the YouTube Channel and this newsletter, we've received an overwhelming number of requests for one specific thing.

While the newsletter helps keep you up to speed with AI news, many of you have asked for the next step: to learn how to actually apply AI in your work.

Today we’re finally announcing the solution with NATURAL 20, the community for like-minded AI learners. As a loyal newsletter reader you are getting access at the lowest price it will ever be:

JOIN NATURAL 20 AI UNIVERSITY TODAY

What you get:

* Tutorials by experts across various AI fields.

* Daily tutorials by Wes Roth about the latest use cases.

* Building Autonomous AI Agents to Automate Your Life and Business (NEW!)

* A network of the top 1% of early AI adopters.

* Access to community-only resources and software.

* And many more features rolling out soon.

Reply

or to participate.