NATURAL 20
Posts
OpenAI Launches o1 Model with Images

OpenAI Launches o1 Model with Images

PLUS: Nvidia to Build AI Centers in Vietnam, Android Enhances Live Caption Experience and more.

Wes Roth
December 06, 2024

In partnership with

SUBSCRIBE | JOIN AI FORUM | LEARN AI

Create, Publish & Earn with Synthflow AI Voice Agents Marketplace

Discover templates for routine/repetitive tasks like lead qualification and managing appointments.
Publish your own Voice AI solutions to help businesses thrive—and earn commissions.
Access custom actions that automate CRM updates, appointment scheduling, and more.

Build Your AI Agent Now

Today:

OpenAI Launches o1 Model with Images
Elon Musk’s xAI Raises $6B
Copilot Vision Brings AI to Edge
Nvidia to Build AI Centers in Vietnam
Android Enhances Live Caption Experience

OpenAI Launches o1 Model with Images

OpenAI has fully launched its new o1 model, which can now analyze images. This means you can upload a photo, and the system will understand what it shows and help you. For example, it can guide you through building a birdhouse from a single picture. It can also solve tough problems, like designing a data center from a sketch (a rough drawing).

Alongside o1, OpenAI introduced ChatGPT Pro, costing $200 each month. Pro users get extra speed, power, and a stronger version of the model. A “benchmark” (a test to measure skill) showed big improvements. OpenAI hopes this upgrade keeps it ahead of rivals.

Elon Musk’s xAI Raises $6B

Elon Musk’s company, xAI, got $6 billion more from investors, pushing total funds to $12 billion. Investors include major groups who put in large amounts of money. With this money, xAI can get better and hire more people. xAI’s main product is an AI helper named Grok. “AI” means a computer program that learns from data and handles tasks needing human thinking.

Grok is inside X and can answer tough questions, make images, and help users. Musk plans to use data from Tesla and SpaceX to train Grok. He wants xAI to top OpenAI.

Copilot Vision Brings AI to Edge

Microsoft is testing a new feature called "Copilot Vision." This tool lets its "Copilot" assistant see what you see on a webpage, with your permission. Copilot, powered by "AI" can read text, view images, and understand what’s on your screen in Microsoft’s Edge web browser. You can then chat with it about what you’re reading or watching.

Copilot Vision is available only to a small group of "Copilot Pro" subscribers, who pay for extra features. Microsoft is cautious, limiting which websites Copilot Vision can access and testing carefully to protect privacy.

Nvidia to Build AI Centers in Vietnam

Nvidia has signed an agreement with the Vietnamese government to establish an AI research and development center, as well as an AI data center in Vietnam. As part of the deal, Nvidia also acquired VinBrain, a healthcare startup that is a part of the Vietnamese conglomerate Vingroup, although the financial details were not disclosed.

The R&D center and data center's financial details were not revealed, but Vietnamese Prime Minister Pham Minh Chinh emphasized the importance of AI in advancing sectors like clean energy, and even in exploring space and the ocean.

Android Enhances Live Caption Experience

Google is introducing Expressive Captions on Android, a feature that enriches automatically generated captions with emotional and contextual cues. Building on Live Caption, Expressive Captions uses on-device AI to convey how something is said, not just what’s being said. Instead of plain text, it can reflect intensity through capitalization (“HAPPY BIRTHDAY!”), identify vocal sounds like sighs and gasps, and note ambient noise, like applause or cheers.

This updated approach helps capture tone, volume, and background details that standard captions often miss. It’s designed to work in real time across most videos, including livestreams, and doesn’t rely on pre-loaded captions. Initially available in English on devices running Android 14 and above, the feature aims to make content more accessible and engaging — ensuring everyone, including those who are Deaf or hard of hearing, can experience the full emotional range of audiovisual content.

🧠RESEARCH

PaliGemma 2: A Family of Versatile VLMs for Transfer

PaliGemma 2 enhances the PaliGemma vision-language model by integrating the SigLIP-So400m vision encoder and various Gemma 2 models. Trained at three resolutions, it improves performance across multiple transfer tasks, including OCR-related tasks and radiography report generation, achieving state-of-the-art results with different model sizes and learning strategies.

SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance

SNOOPI improves one-step text-to-image diffusion models by addressing issues in previous distillation techniques. It enhances training stability using a dynamic guidance scale and introduces Negative-Away Steer Attention (NASA) for incorporating negative prompts. These innovations lead to significant performance improvements, setting a new benchmark for one-step diffusion models.

Imagine360: Immersive 360 Video Generation from Perspective Anchor

Imagine360 is a framework for generating high-quality 360° videos from standard perspective footage. It uses a dual-branch design for local and global constraints, an antipodal mask to improve motion coherence, and elevation-aware features for varied perspectives. Experiments show it outperforms current methods in video quality and motion.

Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding

Video-3D LLM is a model designed for better 3D scene understanding by incorporating 3D position encoding into video representations. It addresses the gap in current multimodal large language models (MLLMs) that struggle with spatial understanding in 3D. The model outperforms existing methods on multiple 3D benchmarks.

One Shot, One Talk: Whole-body Talking Avatar from a Single Image

"One Shot, One Talk" introduces a method to create realistic, animatable whole-body talking avatars from a single image. It uses pose-guided diffusion models to generate video frames and a 3DGS-mesh hybrid avatar representation to overcome dynamic modeling challenges. The result is a photorealistic avatar with precise gestures and expressions.

🛠️TOP TOOLS

Dashworks - Create Custom AI Search Assistants

Reword - Write people-first articles with an AI trained by you.

Unriddle - Quickly find info in research papers, simplify complex topics, write with AI and keep everything organized.

Flick - AI-powered social marketing platform that help with scheduling, hashtags, caption writing, and analytics.

Theneo - Generate Stripe-like docs in seconds

📲SOCIAL MEDIA

OpenAI o1 is now out of preview in ChatGPT.
What’s changed since the preview? A faster, more powerful reasoning model that’s better at coding, math & writing.
o1 now also supports image uploads, allowing it to apply reasoning to visuals for more detailed & useful responses.
— OpenAI (@OpenAI)
6:16 PM • Dec 5, 2024

🗞️MORE NEWS

OpenAI has partnered with Future, the publisher of websites like Tom’s Guide and PC Gamer, to offer ChatGPT users access to content from over 200 Future brands. This follows OpenAI's recent content licensing deals with other major media outlets.
TSMC is in talks with Nvidia to produce its Blackwell AI chips at a new plant in Arizona, starting production early next year. The chips, currently made in Taiwan, will still require shipping back for packaging.
AWS announced new tools at re:Invent 2024 to simplify the integration of structured and unstructured data into retrieval augmented generation (RAG) pipelines. These updates, including Amazon Bedrock Knowledge Bases and GraphRAG, aim to improve accuracy, automate data processing, and enhance AI applications.
OpenAI is exploring ways to integrate custom GPTs with online courses, allowing professors to upload their content and create personalized chatbots for student engagement. Though some educators remain skeptical, OpenAI is committed to refining the technology for better teaching and learning outcomes.

What'd you think of today's edition?

Learn AI with us.

Let’s Build the Future Together.

Hello fellow AI-obsessed traveler,

Over the past 2 years, as we’ve grown to over 250,000 subscribers between the YouTube Channel and this newsletter, we've received an overwhelming number of requests for one specific thing.

While the newsletter helps keep you up to speed with AI news, many of you have asked for the next step: to learn how to actually apply AI in your work.

Today we’re finally announcing the solution with NATURAL 20, the community for like-minded AI learners. As a loyal newsletter reader you are getting access at the lowest price it will ever be:

JOIN NATURAL 20 AI UNIVERSITY TODAY

What you get:

* Tutorials by experts across various AI fields.

* Daily tutorials by Wes Roth about the latest use cases.

* Building Autonomous AI Agents to Automate Your Life and Business (NEW!)

* A network of the top 1% of early AI adopters.

* Access to community-only resources and software.

* And many more features rolling out soon.

Reply

or to participate.