• NATURAL 20
  • Posts
  • Microsoft Debuts Independent AI Agents

Microsoft Debuts Independent AI Agents

PLUS: Gemini AI Now With Memory, Meta Taps Shih for Business AI and more.

In partnership with

The fastest way to build AI apps

  • Writer Framework: build Python apps with drag-and-drop UI

  • API and SDKs to integrate into your codebase

  • Intuitive no-code tools for business users

Today:

  • ChatGPT Voice Mode Hits Web

  • Claude Beats ChatGPT in Research

  • Gemini AI Now With Memory

  • Microsoft Debuts Independent AI Agents

  • Meta Taps Shih for Business AI 

  • AnyChat Combines ChatGPT, Gemini, LLaMA

OpenAI has launched ChatGPT’s Advanced Voice Mode for web browsers, enabling real-time, conversational interaction using voice. Announced by OpenAI’s product chief Kevin Weil, this feature is available to paid users, including Plus, Enterprise, Teams, and Edu subscribers. Advanced Voice Mode, introduced in September on mobile apps, uses GPT-4o’s audio processing to understand speech patterns and emotions. Users can select from nine unique voices and interact naturally. 

To activate, grant microphone access via the Voice icon in ChatGPT. Free users will get limited previews soon. Daily usage caps apply, with notifications for remaining time. OpenAI plans broader access in coming weeks.

Anthropic’s Claude Sonnet 3.5 outperformed OpenAI’s o1-preview in five out of seven tests evaluating AI research capabilities, according to Model Evaluation and Threat Research (METR). These tests, designed to challenge both AI and human researchers, highlighted Claude’s strengths in solving complex tasks, occasionally matching average human performance. 

However, both models lagged significantly behind top human researchers. The tests assessed creativity, hypothesis formulation, and experimentation, often placing humans at a disadvantage to measure AI progress safely. OpenAI and Anthropic cooperated with METR, aligning with global efforts to regulate AI’s self-improvement potential due to its systemic risks.

Google's Gemini AI now offers a memory feature, enabling it to personalize responses based on user preferences. Available to Gemini Advanced subscribers through Google One AI Premium, the chatbot allows users to input and manage personal details, such as dietary preferences or professional roles, via its “Saved Info” page. Users can view, edit, or delete stored data, and Gemini will indicate when it uses this information in its responses. 

This feature mirrors OpenAI’s Memory, introduced in April for ChatGPT Plus, which retains conversation context for tailored interactions. Currently available only in English, Gemini’s personalization aims to enhance user experience.

Microsoft has unveiled autonomous AI agents for its Copilot platform, now in public preview. These agents can perform complex, linked tasks independently, such as managing projects, responding to HR queries, or automating workflows. Powered by OpenAI's GPT-4 models, agents integrate memory, permissions, and tools to adapt and make decisions dynamically. 

Microsoft 365 Copilot Studio allows users to customize agents for specific needs, with templates and tools for developers. Features include voice-enabled interactions, image analysis, and knowledge tuning. Safety measures, including human oversight, aim to ensure responsible use. Microsoft emphasizes these tools as productivity enhancers, not replacements for human workers.

Meta has appointed Clara Shih, former Salesforce AI chief, to lead its new Business AI group. The team will create AI tools for businesses using Meta’s platforms like Instagram, Facebook, and WhatsApp. Leveraging Meta’s Llama language models, the group aims to enhance ad creation and customer engagement with potential tools for AI-generated content. 

While Meta hasn’t clarified whether these tools will be free or monetized, they align with its ad-driven revenue model. Shih’s move comes after Salesforce’s struggles in the AI space, offering her a fresh start to develop innovative AI solutions within Meta’s ecosystem.

AnyChat, developed by Ahsen Khaliq, unites multiple AI models, including ChatGPT, Google Gemini, and Meta’s LLaMA, under one interface. Built on the Gradio framework, it allows developers to switch between proprietary and open-source models, offering flexibility and cost savings. 

AnyChat supports multimodal AI for processing text and images, making it ideal for diverse enterprise needs like customer support and image analysis. Its open-source architecture encourages community contributions, expanding its capabilities over time. By simplifying model integration and deployment, AnyChat addresses the limitations of single-platform AI systems, providing a scalable, efficient solution for developers and businesses.

🧠RESEARCH

The Generative World Explorer (Genex) enables AI agents to mentally explore unseen parts of 3D environments, like urban scenes, without physical movement. By imagining and updating beliefs, Genex helps agents make better decisions. Experiments show its ability to produce consistent virtual observations and improve decision-making in complex environments.

BlueLM-V-3B optimizes multimodal large language models (MLLMs) for mobile devices, balancing size, speed, and performance. With a compact design and efficient hardware integration, it delivers real-time processing, outperforming larger models while achieving top benchmark scores on mobile platforms.

AnimateAnything is a video generation framework enabling precise, consistent animations based on inputs like text prompts and motion annotations. Using optical flow-based motion representation and frequency-based stabilization, it reduces flickering and outperforms existing methods in quality and coherence.

This study examines how written guidelines, or "constitutions," impact the quality of AI feedback in training large language models for medical interviews. Detailed constitutions improved emotive communication but fell short in enhancing practical skills like information gathering, highlighting limitations of AI feedback in specific tasks.

This paper introduces "verifier engineering," a new post-training approach for foundation models. It uses automated verifiers to search, assess, and provide feedback, enhancing model capabilities. This three-stage process could drive progress toward Artificial General Intelligence by improving supervision and learning signals.

🛠️TOP TOOLS

Gecko Security - AI-powered security engineer that finds and fixes vulnerabilities in your codebase

Prompt Improver Anthropics - Improve prompts and manage examples directly in the Anthropic Console.

Locofy - Frontend Development at Lightning Speed design to code in a flash powered by AI

Intercom - AI agent in customer service that can handle your entire frontline support—on your existing platform or on Intercom's AI-first platform. 

Tips - The most advanced AI Website Builder for Tailwind websites

📲SOCIAL MEDIA

🗞️MORE NEWS

  • Microsoft announced Zero Day Quest, a major hacking event to find flaws in cloud and AI systems. With $4 million in rewards, it invites researchers to collaborate on enhancing security and transparency.

  • Microsoft introduced Azure AI Foundry at Ignite 2024, unifying its AI tools for enterprises. It streamlines AI deployment, customization, and management with new SDKs, templates, and advanced tools, fostering collaboration across business and technical teams.

  • OpenAI pays Dotdash Meredith a minimum of $16 million annually to license content, with additional variable payments expected. The deal highlights OpenAI's strategy to train AI while supporting publishers financially.

  • Rox leverages OpenAI’s models to streamline sales workflows, combining data integration and AI-powered insights. It doubles pipelines, saves time, and boosts engagement, enabling scalable, efficient revenue management for enterprise sales teams.

  • Microsoft unveiled two custom data center chips to enhance AI performance and security. The Azure Integrated HSM chip improves encryption, while the DPU boosts cloud data efficiency with better power and performance.

  • Indian news agency ANI has sued OpenAI for using its content without permission to train AI models, alleging copyright violations. OpenAI denies wrongdoing, citing fair use. The case will continue in January.

What'd you think of today's edition?

Login or Subscribe to participate in polls.

Learn AI with us.

Let’s Build the Future Together.

Hello fellow AI-obsessed traveler,

Over the past 2 years, as we’ve grown to over 250,000 subscribers between the YouTube Channel and this newsletter, we've received an overwhelming number of requests for one specific thing.

While the newsletter helps keep you up to speed with AI news, many of you have asked for the next step: to learn how to actually apply AI in your work.

Today we’re finally announcing the solution with NATURAL 20, the community for like-minded AI learners. As a loyal newsletter reader you are getting access at the lowest price it will ever be:

 JOIN NATURAL 20 AI UNIVERSITY TODAY

What you get:

* Tutorials by experts across various AI fields.

* Daily tutorials by Wes Roth about the latest use cases.

* Building Autonomous AI Agents to Automate Your Life and Business (NEW!)

* A network of the top 1% of early AI adopters.

* Access to community-only resources and software.

* And many more features rolling out soon.

Reply

or to participate.