NATURAL 20
Posts
Tencent's Hunyuan Video Transforms Creation

Tencent's Hunyuan Video Transforms Creation

PLUS: Amazon and Anthropic’s AI Supercomputer, Hume AI Launches Voice Customization and more.

Wes Roth
December 04, 2024

In partnership with

SUBSCRIBE | JOIN AI FORUM | LEARN AI

Tackle Your Credit Card Debt With 0% Interest Until Nearly 2027 AND Earn 5% Cash Back

Some credit cards can help you get out of debt faster with a 0% intro APR on balance transfers. Transfer your balance, pay it down interest-free, and save money. FinanceBuzz reviewed top cards and found the best options—one even offers 0% APR into 2027 + 5% cash back!

Learn How To Apply Now

Today:

Tencent's Hunyuan Video Transforms Creation
Google Unveiled Veo and Imagen 3
Luma Image Generation Models
Amazon and Anthropic’s AI Supercomputer
Hume AI Launches Voice Customization

Tencent's Hunyuan Video Transforms Creation

Tencent has introduced Hunyuan Video, an open-source AI model for text-to-video generation with 13 billion parameters. Known for its superior video quality, it outperforms models like Runway Gen-3 and Luma. Key features include a Multimodal Large Language Model for better text-image alignment and a 3D Variational Autoencoder for efficient processing.

This release advances video generation technology, bridging the gap between open-source and proprietary solutions. By providing access to its code and model weights, Hunyuan Video empowers the community to explore and innovate in the field of AI-driven video creation.

Google Unveiled Veo and Imagen 3

Google Cloud has introduced two powerful generative AI models, Veo and Imagen 3, on its Vertex AI platform. Veo enables businesses to create high-quality videos from text or image prompts, while Imagen 3 generates photorealistic images with exceptional detail from text prompts. These tools streamline creative processes, reducing production time and costs.

With built-in safety features like digital watermarking and safety filters, Google ensures responsible use of AI. Leading companies like Mondelez and WPP are already leveraging these models to enhance content creation. Veo and Imagen 3 aim to revolutionize marketing, advertising, and creative industries by enabling rapid, high-quality media generation.

Luma Image Generation Models

Luma Photon introduces image generation models that combine high creativity, intelligence, and efficiency. The new architecture delivers ultra-high-quality images at a significantly lower cost—up to 10 times more efficient than existing models. These models excel in understanding natural language instructions and enabling multi-turn workflows for ideation and editing.

Designed for designers, filmmakers, and visual thinkers, Luma Photon allows for consistent characters and iterative prompts with just one image. Available through Luma API and Dream Machine service, Luma Photon sets a new standard for visual intelligence, offering creative freedom and affordable, high-quality image generation.

Amazon and Anthropic’s AI Supercomputer

Amazon is building one of the world's largest AI supercomputers in collaboration with Anthropic, aiming to advance artificial intelligence. The supercomputer, called Project Rainer, will be five times more powerful than Anthropic’s current model and use Amazon’s advanced Trainium 2 chips.

This move positions Amazon to compete more strongly with companies like Microsoft and Google in generative AI. Additionally, Amazon announced new tools to help businesses manage and improve AI, including a system to verify chatbot accuracy and a method for optimizing smaller AI models. These innovations highlight Amazon’s growing influence in the AI sector.

Hume AI Launches Voice Customization

Hume AI has introduced Voice Control, a tool that allows users to create custom AI voices without any coding skills. It offers sliders to adjust vocal traits like confidence, enthusiasm, and smoothness, making it easy for developers to design voices that suit specific needs. This tool builds on Hume’s earlier Empathic Voice Interface (EVI 2), which focused on emotional responsiveness and naturalness in voice AI.

Voice Control aims to replace preset voices and address voice cloning risks by enabling highly personalized voice options. It's available in beta and integrates with Hume’s platform for various applications like customer service and virtual assistants.

🧠RESEARCH

X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models

X-Prompt is a new vision-language model that improves image generation by using in-context learning, similar to how language models work with text. It can handle both familiar and new tasks efficiently by processing context examples and generalizing to unseen tasks. Experiments show it performs well in diverse image generation tasks.

Open-Sora Plan: Open-Source Large Video Generation Model

Open-Sora Plan is an open-source project focused on creating a large model for generating high-resolution, long-duration videos based on user inputs. It combines several advanced components and strategies for efficient training and data curation. The project achieves strong results in video generation and offers its code and model weights to the research community.

GATE OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation

GATE OpenING introduces a benchmark for evaluating interleaved image-text generation, featuring 5,400 annotated examples across 56 real-world tasks. It assesses multimodal understanding and generation in scenarios like travel and design. The IntJudge model, achieving 82.42% human agreement, highlights gaps in current methods, guiding future improvements. OpenING is publicly accessible.

VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation

VISTA introduces a framework to improve video models' understanding of long-duration and high-resolution videos by synthesizing enhanced datasets. Using spatial and temporal augmentation, it generates extended videos with paired questions and answers. Finetuning on the VISTA-400K dataset boosts model performance, achieving notable gains on new benchmarks like HRVideoBench.

SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters

SOLAMI is a framework designed to enable 3D autonomous characters to interact socially with humans. It integrates vision, language, and action (VLA) to create realistic, responsive characters that understand and react to multimodal inputs. Using synthetic data and a VR interface, SOLAMI achieves more natural and accurate interactions with low latency.

🛠️TOP TOOLS

Modal - Serverless cloud for AI, ML, and data applications – built for developers

Colossyan - The AI video platform for workplace learning

Hume + Anthropic Computer Use - Create apps to control a computer with just your voice

Silatus - Write articles, speeches, job descriptions, press releases, memos, and more.

Snappy Retro - Create your retro board in seconds, share the encrypted unique URL, and collaborate in real-time.

📲SOCIAL MEDIA

🌍 HeyGen Avatars speak your language—and your dialect too!
With 70+ languages and 175+ dialects, our avatars and AI-powered video translation tools make global communication seamless, lifelike, and truly localized. Translate, localize, and connect with ease.
— HeyGen (@HeyGen_Official)
5:13 PM • Dec 3, 2024

🗞️MORE NEWS

Amazon announced Nova, a family of generative AI models at its re:Invent 2024 conference. It includes text-focused models (Micro, Lite, Pro, Premier), image and video generation models (Nova Canas, Nova Reel), and future multimodal models for various media.
AKOOL and LiveX AI have partnered to enhance customer engagement with advanced conversational AI and dynamic avatar technology. This collaboration delivers human-like virtual agents capable of real-time problem-solving and empathetic interactions, improving satisfaction and loyalty.
NeuroAI for AI safety draws on the human brain’s mechanisms to create safer AI. By understanding neural systems, we can develop AI aligned with human values, enhancing transparency, robustness, and ethical behavior while accelerating neuroscience and neurotechnology.
Sakana AI introduces CycleQD, an evolutionary AI framework that uses model merging and mutation to evolve a diverse population of small, specialized models. This approach outperforms traditional methods, providing sustainable, high-performance AI across tasks.
MIT researchers developed a photonic chip that uses light to perform deep neural network computations. This device enhances speed and energy efficiency, achieving ultra-low latency and high accuracy, with potential for real-time AI applications in fields like telecommunications and research.
Skyflow has launched a new security and privacy solution for Agentic AI, enabling businesses to build trustworthy AI agents. It ensures data protection, compliance with global regulations, and safeguards sensitive data throughout the AI lifecycle, from training to execution.

What'd you think of today's edition?

Learn AI with us.

Let’s Build the Future Together.

Hello fellow AI-obsessed traveler,

Over the past 2 years, as we’ve grown to over 250,000 subscribers between the YouTube Channel and this newsletter, we've received an overwhelming number of requests for one specific thing.

While the newsletter helps keep you up to speed with AI news, many of you have asked for the next step: to learn how to actually apply AI in your work.

Today we’re finally announcing the solution with NATURAL 20, the community for like-minded AI learners. As a loyal newsletter reader you are getting access at the lowest price it will ever be:

JOIN NATURAL 20 AI UNIVERSITY TODAY

What you get:

* Tutorials by experts across various AI fields.

* Daily tutorials by Wes Roth about the latest use cases.

* Building Autonomous AI Agents to Automate Your Life and Business (NEW!)

* A network of the top 1% of early AI adopters.

* Access to community-only resources and software.

* And many more features rolling out soon.

Reply

or to participate.