NATURAL 20
Posts
OpenAI Launched Advanced Voice

OpenAI Launched Advanced Voice

PLUS: D-ID Unveils Real-Time AI Avatars, AI Powers Real-Time Minecraft Experience and more.

Wes Roth
November 04, 2024

In partnership with

SUBSCRIBE | JOIN AI FORUM | LEARN AI

Run Your Sales on Autopilot

Increase the output of your sales team without buying more tools or hiring new SDRs. Onboard Agent Frank, Salesforge’s AI SDR, to fully automate prospecting, personalized outreach and booking meetings while your team focuses on closing deals. Get a personalised email from Agent Frank to see his work in action:

Show me the magic

Today:

OpenAI Launched Advanced Voice
Runway's 3D Camera Controls
AI Powers Real-Time Minecraft Experience
D-ID Unveils Real-Time AI Avatars
Disney Unites AI and AR Efforts
Hugging Face’s AI Now On-Device
Nvidia Eyes Billion-Dollar xAI Partnership
China Develops Military AI from Llama

Advanced Voice Mode hits the Desktop!

OpenAI launched an advanced voice feature on the desktop app, enhancing user interaction.

This feature allows seamless conversations, interruptions, and improved voice recognition. It can even switch accents for a more personalized experience.

WATCH THE VIDEO ON YOUTUBE

Runway's 3D Camera Controls

Runway has introduced advanced 3D camera controls for its Gen-3 Alpha Turbo model, empowering creators to control AI-generated videos with cinematic precision. These controls allow users to zoom, pan, and arc smoothly around subjects in realistic, immersive 3D environments, avoiding the glitches that previous AI tools often produced. Creators can also add slow, steady tracking shots, dynamic zoom-ins, and unique transitions, making it ideal for seamless storytelling.

This update positions Runway as a leading choice for filmmakers, including Hollywood studios like Lionsgate, enabling high-quality scene creation quickly and affordably on their AI video platform, RunwayML.com.

AI Powers Real-Time Minecraft Experience

Decart, an Israeli AI company, introduced Oasis, an AI-powered open-world model that generates a real-time, playable Minecraft-like experience. Oasis, trained on Minecraft videos, creates immersive worlds from user inputs, simulating game physics and graphics on the fly. While innovative, the model has some technical limitations, including low resolution and occasional layout glitches.Future updates, optimized for advanced chips, may offer high-quality 4K gameplay.

Though Oasis brings fresh interactive potential to AI gaming, copyright concerns linger as it’s unclear if Decart secured Microsoft’s approval to train on Minecraft data. Oasis may redefine user-driven, real-time gaming experiences.

D-ID Unveils Real-Time AI Avatars

AI video platform D-ID launched two advanced avatars, Express and Premium+, designed for business content. Express avatars can mimic head movements after one minute of video training, while Premium+ avatars need a few minutes to reproduce hands and torso, making interactions more lifelike. D-ID envisions these avatars boosting engagement for enterprises, with potential use in webinars, marketing, and customer support.

Accompanying these avatars is an enterprise marketing suite offering video campaign generation, multi-language translation, and CRM integration. D-ID reports that personalized video campaigns increase click-through and conversion rates, aiming to make business interactions more engaging and effective.

Disney Unites AI and AR Efforts

Disney has created a new Office of Technology Enablement, led by CTO Jamie Voris, to unify its AI and mixed reality efforts across divisions like film, TV, and theme parks. This unit will explore applications in augmented reality, virtual reality, and AI to enhance consumer experiences and creative projects.

Disney is building expertise for these technologies with leaders like Kyle Laughlin, focusing on theme park innovations and potential home experiences. As competitors advance in AR/VR, Disney is strategically positioning itself to lead immersive tech in entertainment, aiming to expand its team to around 100 members.

Hugging Face’s AI Now On-Device

Hugging Face has launched SmolLM2, a new line of compact AI language models that operate efficiently on smartphones and other devices with limited processing power. Available in sizes from 135M to 1.7B parameters, SmolLM2 performs impressively in tasks like science reasoning and commonsense knowledge, surpassing larger models like Meta’s Llama.

The largest model, with 1.7B parameters, provides competitive results in benchmarks and chat capabilities, highlighting the potential of smaller, efficient models for on-device applications. This approach offers privacy benefits, faster response times, and accessibility, reducing reliance on cloud computing and making advanced AI accessible for more users and businesses.

Nvidia Eyes Billion-Dollar xAI Partnership

Nvidia, the leading chipmaker, is in talks to invest in Elon Musk’s AI startup, xAI, which powers the chatbot Grok on Musk’s X platform. Musk aims to raise billions, potentially valuing xAI at $40 billion, with plans to increase this valuation through future fundraising rounds. Nvidia CEO Jensen Huang, who has previously praised Musk’s technological achievements, views an xAI partnership as a strategic opportunity.

Nvidia’s investment in xAI would support its AI hardware platform, as xAI requires Nvidia’s powerful chips for its advanced AI developments, despite competing directly with Google’s and OpenAI’s platforms.

China Develops Military AI from Llama

Chinese researchers, including those from institutions tied to the People’s Liberation Army (PLA), have repurposed Meta’s open-source Llama model to develop an AI tool named ChatBIT for military use. This model, enhanced for intelligence and operational decision-making, utilizes Meta’s early Llama 13B large language model as a base.

Despite restrictions in Meta’s policy against military applications, open-source accessibility limits enforcement. ChatBIT’s potential applications include intelligence analysis and command decision-making.

🧠RESEARCH

Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders

Researchers used sparse autoencoders, a type of neural network that simplifies complex data, to understand how the SDXL Turbo text-to-image model works. They discovered that different parts of the model handle composition, details, and color. This breakthrough helps reveal how such image-generation models create pictures from text.

What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective

This study examines how "fast" and "slow" thinking affect large language model (LLM) training. Fast thinking causes larger, varied gradients in layers, leading to instability, while slow, step-by-step thinking (e.g., chain-of-thought) promotes learning stability. These insights reveal how different thinking paths influence LLM accuracy and training efficiency.

SelfCodeAlign: Self-Alignment for Code Generation

Researchers created SelfCodeAlign, a method to improve code-generating AI without relying heavily on human input. It generates coding tasks, tests multiple solutions, and selects the best ones for training. This approach makes smaller models perform better than larger ones and achieves top results in coding tasks, enhancing AI’s ability to write code effectively.

In-Context LoRA for Diffusion Transformers

In-Context LoRA (IC-LoRA), a streamlined method for improving image quality in text-to-image models. By tweaking only training data, not the model itself, IC-LoRA enables high-fidelity image generation that follows prompts better. This efficient, adaptable approach advances image generation without requiring extensive resources.

M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation

M2RC-EVAL, a benchmark for evaluating code completion in 18 programming languages, addressing the limitations of current benchmarks focused on only a few languages. It includes detailed scenario-based annotations to enhance understanding of code models' capabilities. The accompanying M2RC-INSTRUCT dataset further boosts multilingual code completion performance.

🛠️TOP TOOLS

Vidify - Make videos out of your Shopify product images

Humantic AI -Help sales teams understand buyers and close more deals

Magic - Jot down what matters to you, and AI takes it from there, turning your raw notes and meeting transcription into structured insights, beautiful formatting

Loomos - Transform raw screen recordings into studio-quality videos in single click

Chat2DB - Connect to all your data sources, instantly generate optimal SQL to get lightning-fast data insights.

📲SOCIAL MEDIA

Claude can now view images within a PDF, in addition to text.
This helps Claude 3.5 Sonnet more accurately understand complex documents, such as those laden with charts or graphics.
Enable the feature preview: claude.ai/new?fp=1.
— Anthropic (@AnthropicAI)
4:54 PM • Nov 1, 2024

🗞️MORE NEWS

MIT developed a new robot training method inspired by large language models. Using diverse data from sensors and environments, the model enables robots to adapt better to various settings, aiming for universal, flexible robot intelligence.
Google is integrating smart home controls into its Gemini app, enabling natural language commands like adjusting lights or turning on vacuums. Limited to Android's Public Preview and specific devices, the feature requires the Google Home app for security-related controls.
Perplexity launched an election tracker, using data from The Associated Press and Democracy Works to provide real-time updates and information on U.S. races. This hub aims to deliver trustworthy AI-powered election insights, tackling concerns around misinformation.
DevRev, an AI-driven enterprise platform, raised $100.8 million in Series A funding, valuing it at $1.15 billion. Its AgentOS uses AI and Knowledge Graphs to streamline customer support, product management, and engineering, enhancing customer-focused operations.
Microsoft has delayed the release of its Recall feature for Copilot Plus PCs to December, citing the need for enhanced security. Recall, an optional screenshot tool, will allow users to view past activities through a secure, encrypted timeline.
Apple is acquiring Pixelmator, a popular image-editing app, which will initially keep its standalone functionality. The acquisition hints at potential integration of Pixelmator’s features into Apple’s Photos app, continuing Apple’s focus on AI-enhanced imaging tools.

What'd you think of today's edition?

Learn AI with us.

Let’s Build the Future Together.

Hello fellow AI-obsessed traveler,

Over the past 2 years, as we’ve grown to over 250,000 subscribers between the YouTube Channel and this newsletter, we've received an overwhelming number of requests for one specific thing.

While the newsletter helps keep you up to speed with AI news, many of you have asked for the next step: to learn how to actually apply AI in your work.

Today we’re finally announcing the solution with NATURAL 20, the community for like-minded AI learners. As a loyal newsletter reader you are getting access at the lowest price it will ever be:

JOIN NATURAL 20 AI UNIVERSITY TODAY

What you get:

* Tutorials by experts across various AI fields.

* Daily tutorials by Wes Roth about the latest use cases.

* Building Autonomous AI Agents to Automate Your Life and Business (NEW!)

* A network of the top 1% of early AI adopters.

* Access to community-only resources and software.

* And many more features rolling out soon.

Reply

or to participate.