NVIDIA Unveils Foundation Agent

PLUS: Character.AI Launches Voice Feature, Instant 3D Creation with Nvidia's Latte3D and more.

Today:

  • NVIDIA Unveils Foundation Agent

  • Apple-Baidu AI Collaboration Talks

  • OpenAI Eyes Hollywood Integration

  • Character.AI Launches Voice Feature

  • Microsoft's $650M Inflection Deal

  • Instant 3D Creation with Nvidia's Latte3D

NVIDIAs new 'Foundation Agent' SHOCKS the Entire Industry! | Dr. Jim Fan, GR00T and Isaac Robotics

GearLab aims to tackle the challenges of scaling up robotics through mission-driven research. While their focus remains on advancing the frontiers of AI, they also collaborate with industry partners like NVIDIA to develop infrastructure such as the Osmo compute orchestration system.

This system supports projects like Groot, which seeks to create a foundation model for humanoid robots. Ultimately, their goal is to contribute to a future where autonomous agents can seamlessly navigate both virtual and physical worlds.

Apple Held Talks With China’s Baidu Over AI for Its Devices

Apple has engaged in initial discussions with Baidu to incorporate the Chinese firm's AI tech into its devices in China. This move indicates Apple's pursuit to enhance its AI capabilities by collaborating with external partners like Google and OpenAI. Baidu's shares surged 5% in premarket trading following the news. 

Both Apple and Baidu declined immediate comment on the matter. This development aligns with Apple's broader strategy to leverage external expertise to advance its AI initiatives, potentially influencing its presence in the Chinese market.

OpenAI Courts Hollywood in Meetings With Film Studios, Directors

OpenAI is eyeing Hollywood with plans to integrate its AI video generator into films. The startup is set to meet with major studios, media execs, and talent agencies in Los Angeles, aiming to forge partnerships and persuade filmmakers to adopt its technology. 

This move signals OpenAI's push to expand into the entertainment industry, leveraging AI advancements for creative endeavors. CEO Altman's attendance at LA parties during Oscars weekend underscores the startup's efforts to court Hollywood. With AI increasingly influencing various sectors, OpenAI's move highlights the intersection of technology and film production, potentially reshaping storytelling in the movie business.

Character Voice For Everyone

Character.AI updates its interface with Character Voice, allowing users to hear Characters speak in 1:1 chats, enhancing engagement. Users can access pre-made voices or create custom ones, enriching interactions. Safety measures are in place, and the feature is currently available in English with plans for expansion. Early testers can toggle voices and provide feedback, shaping the development. Voice integration adds depth and realism, enriching experiences like text-based games. 

The community's creativity drives innovation, fostering a dynamic platform. With Group Chat and Voice implemented, Character.AI anticipates further evolution, inviting users on the journey of exploration.

Microsoft Agreed to Pay Inflection $650 Million While Hiring Its Staff

Microsoft has agreed to pay around $650 million to Inflection AI, not as an acquisition but as a licensing deal. This deal allows Microsoft access to Inflection's AI models on its Azure cloud service. 

The hefty sum also aims to provide Inflection's investors with a modest return. The move signifies Microsoft's strategic interest in talent acquisition and AI technology, reflecting the ongoing competition in the tech industry.

Nvidia unveils Latte3D to instantly generate 3D shapes from text

Nvidia introduces Latte3D, a breakthrough in AI technology that instantly generates high-quality 3D shapes from text prompts. Developed by Nvidia's AI lab team, Latte3D enables near-real-time creation of 3D objects and animals, revolutionizing content creation across industries. 

Using a single GPU, Latte3D eliminates the need for time-consuming rendering processes, offering creators efficiency and flexibility. It can be trained on various datasets, catering to applications in landscape design, robotics, and beyond. Powered by Nvidia A100 Tensor Core GPUs and trained on diverse text prompts, Latte3D showcases Nvidia's commitment to advancing AI-driven content creation tools.

🧠RESEARCH

The paper explores the effectiveness of Multi-modal Large Language Models (MLLMs) in solving visual math problems. It introduces MathVerse, a comprehensive benchmark with 2,612 math problems and diagrams to evaluate MLLMs' ability. The benchmark aims to assess if MLLMs truly understand visual data for mathematical reasoning using a Chain-of-Thought evaluation strategy.

DreamReward is a framework enhancing text-to-3D generation by incorporating human preference feedback. They collect expert comparisons to create Reward3D, a model encoding human preferences for 3D content. DreamFL, a tuning algorithm, optimizes multi-view diffusion models based on human feedback, resulting in high-fidelity and human-aligned 3D outputs.

Cobra is a multimodal large language model (MLLM) with linear computational complexity. Cobra integrates the efficient Mamba language model into the visual modality and explores various modal fusion schemes. Experiments show Cobra's competitive performance with state-of-the-art methods like LLaVA-Phi, with faster speed and the ability to overcome visual illusions. The authors plan to make Cobra's code open-source to aid future research.

AnyV2V is a versatile framework for video-to-video editing tasks. It simplifies editing into two steps: modifying the first frame with off-the-shelf image editing models and using existing image-to-video generation models for feature injection. AnyV2V supports various editing tasks including prompt-based, reference-based style transfer, subject-driven editing, and identity manipulation, outperforming previous methods in prompt alignment and human preference. It's designed to integrate rapidly evolving image editing methods, increasing its versatility to meet diverse user demands.

GRM is a large-scale reconstructor capable of efficiently reconstructing 3D assets from sparse-view images in approximately 0.1s. Utilizing a feed-forward transformer-based model, GRM incorporates multi-view information to translate input pixels into pixel-aligned Gaussians, enabling scalable and efficient reconstruction. Experimental results demonstrate superior reconstruction quality and efficiency compared to alternatives, with potential applications in text-to-3D and image-to-3D generation tasks. 

🛠️TOP TOOLS

Kater AI - Data agent that gets better the more you use it.

AI Transcription - Generate incredibly accurate transcripts for your podcast episodes.

Instanice - Upload your profile picture or portrait and turn it around with aesthetic photo effects

D-ID Agents - Create an interactive AI agent that works for you

Circleback - AI-powered notes, action items, and automations. Automatically updates HubSpot, Notion, and more.

📲SOCIAL MEDIA

🗞️MORE NEWS

Google starts testing AI overviews from SGE in main Google search interface

Google is testing AI summaries in search results for some US users, regardless of opting into Google SGE labs. These summaries aim to enhance complex queries' responses. Feedback from non-opted users will shape future implementation. Ads will still be present, potentially impacting site traffic. This shift may alter SEO strategies. SEARCH ENGINE LAND

Anthropic is lining up a new slate of investors, but the AI startup has ruled out Saudi Arabia

Anthropic, a leading AI startup, is attracting investor interest, with its stake valued over $1 billion. Notably, the company refuses Saudi funding due to national security concerns, while considering investments from other sovereign wealth funds like UAE's Mubadala. Anthropic's sale aims to repay FTX debts amidst ongoing bankruptcy proceedings. CNBC

The Browser Company raises $50M at a $550M valuation

The Browser Company secures $50 million from Pace Capital, valuing it at $550 million. Led by Josh Miller and Hursh Agrawal, the firm develops the Arc browser with innovative features. Despite criticism over AI-driven search, its ambition to redefine browsing persists, aiming to monetize sustainably amidst challenges. TECHCRUNCH

Google AI could soon use a person’s cough to diagnose disease

Google AI has developed a tool, HeAR, trained on vast human audio data, capable of detecting diseases like COVID-19 and tuberculosis from coughs and breath sounds. Unlike traditional methods, HeAR uses self-supervised learning, analyzing unlabelled data. With superior performance and potential for widespread use, it marks a significant advance in health diagnostics. NATURE

AI influencers explode on social media. Some are controlled by teens

AI influencers, controlled by teens, surge on social media. 1337, a tech startup, crafts AI avatars like Agnes, Finn, and Jade, garnering followers. With a projected $125 billion market by 2035, concerns arise over manipulation and misinformation. Teen curators ensure authenticity, but risks remain, urging for clearer labeling of AI-generated content. ABC NEWS

10% of US workers are in jobs most exposed to artificial intelligence, White House says

About 10% of US workers face significant AI disruption, with lower-educated and lower-income workers most vulnerable, potentially exacerbating inequality. White House report suggests nuanced impacts, cautioning against assuming mass job losses. Policy discussions ongoing with labor unions to mitigate risks and guide AI's development for worker benefit. CNN

What'd you think of today's edition?

Login or Subscribe to participate in polls.

What are MOST interested in learning about AI?

What stories or resources will be most interesting for you to hear about?

Login or Subscribe to participate in polls.

Reply

or to participate.