NATURAL 20
Posts
Super Agent Octopus V2 Outperforms GPT-4

Super Agent Octopus V2 Outperforms GPT-4

PLUS: Gemini Joins Android's Google App, YouTube Transcripts Fuel GPT-4's Training and more.

Wes Roth
April 08, 2024

Today:

Super Agent Octopus V2 Outperforms GPT-4
Facebook, Instagram to Label AI Content
YouTube Transcripts Fuel GPT-4's Training
Microsoft, Nvidia Forge Stronger AI Bonds
Canada Invests $1.8B in AI Future
Gemini Joins Android's Google App

Stanford "Octopus v2" SUPER AGENT beats GPT-4 | Runs on Google Tech | Tiny Agent Function Calls

Stanford University unveils Octopus V2, an on-device language model named "super agent," outperforming GPT-4 in accuracy and latency. Unlike cloud-based models, Octopus runs locally, addressing privacy and cost concerns.

The research demonstrates the efficacy of compact models with just two billion parameters, challenging the notion that bigger is always better in AI development. Octopus shows promise for various applications, from calendar reminders to text messaging, across edge devices like smartphones and VR headsets.

WATCH THE VIDEO ON YOUTUBE

Facebook and Instagram to label digitally altered content ‘made with AI’

Meta, parent company of Facebook and Instagram, will mark digitally altered content with "Made with AI" labels starting May, expanding to videos, images, and audio. High-risk content deceiving the public on important matters will get prominent labels, regardless of AI use.

This shift aims to inform viewers about manipulated content's origins instead of simply removing it. Meta plans to detect AI-generated images with invisible markers. These changes precede the US presidential election, where AI could play a role despite previous criticism of Meta's handling of manipulated media. The focus now extends beyond AI-generated content to include all misleading media.

THE GUARDIAN

OpenAI transcribed over a million hours of YouTube videos to train GPT-4

YouTube Dark Mode 3D icon concept. Write me: alexanderbemore@gmail.com, if you need 3D visuals for your products.

AI giants like OpenAI are facing challenges in acquiring quality training data. To train its GPT-4 model, OpenAI transcribed over a million hours of YouTube videos, a move legally questionable but deemed fair use by the company. Google, another player in AI, also utilized YouTube transcripts for training.

Both companies are navigating legal and technical barriers in accessing data, with Google making privacy policy adjustments to expand data usage. Meta, too, explored unconventional methods like transcribing copyrighted works. With training data dwindling, AI firms seek alternative solutions like generating synthetic data, but legal concerns persist.

THE VERGE

Microsoft and Nvidia announce major new integrations, breakthroughs and more at GTC

Microsoft and Nvidia made significant announcements at the Nvidia GTC AI conference, showcasing their deepened collaboration. Microsoft's Nidhi Chappell discussed partnerships with OpenAI and Nvidia, emphasizing the importance of infrastructure in training large AI models. Microsoft integrated Nvidia's latest hardware, including the GB200 Grace Blackwell Superchip and Quantum-X800 InfiniBand networking, into Azure.

Additionally, Microsoft unveiled advancements in healthcare, industrial digital twins, real-time contextualized intelligence, and AI deployment. The collaboration aims to accelerate AI innovation across various industries, leveraging Azure's capabilities and Nvidia's technologies.

VENTUREBEAT

Trudeau Unveils $1.8 Billion Package for Canada’s AI Sector

Canada is launching a $1.8 billion fund to bolster its AI sector, including funding for technological infrastructure. Prime Minister Justin Trudeau announced the initiative ahead of the budget, aiming to support AI researchers, startups, and firms.

The package also includes the establishment of an AI safety institute. Canada's move reflects its commitment to fostering AI innovation and development.

BLOOMBERG

Gemini is coming to the Android Google app

Gemini is expanding to the Android Google app, mirroring its functionality on iOS. Users can access it by tapping the Gemini logo, initiating a chatbot prompt field.

Google app for Android to soon get toggle to switch between Gemini and Search [just like on iOS]
📝 Read - piunikaweb.com/2024/04/07/goo…
#Google#Android
— AssembleDebug (@AssembleDebug)
8:57 AM • Apr 7, 2024

From there, they can interact with Google's revamped chatbot, requesting image creation or analysis by sending pictures.

THE VERGE

🧠RESEARCH

ReFT: Representation Finetuning for Language Models

ReFT is a method for fine-tuning language models more efficiently by editing representations rather than adjusting numerous parameters. Their approach, exemplified by LoReFT, outperforms previous methods like PEFT in tasks like reasoning and evaluation. LoReFT's interventions are 10x-50x more efficient, showcased across various evaluations. They provide a public ReFT training library.

CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

CoMat is a method addressing misalignment in text-to-image generation by enhancing token attention activation. They attribute misalignment to inadequate condition utilization in diffusion models due to their training paradigm. CoMat incorporates an image-to-text concept matching mechanism and attribute concentration module, improving alignment without additional data. CoMat-SDXL outperforms SDXL in text-to-image alignment benchmarks.

MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens

MiniGPT4-Video is a multimodal Large Language Model tailored for video comprehension. Extending the success of MiniGPT-v2 in handling single images, this model processes sequences of frames, integrating both visual and textual data. It outperforms existing methods on benchmarks like MSVD, MSRVTT, TGIF, and TVQA, showcasing significant improvements. The models and code are publicly accessible.

LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models

LVLM-Interpret is an interactive tool for understanding large vision-language models (LVLMs). In the evolving field of artificial intelligence, LVLMs combining diverse data inputs are gaining traction, yet their internal workings remain opaque. This tool focuses on enhancing interpretability, particularly in understanding image patches' role and assessing model performance in connecting language with images. It enables systematic investigation to uncover system limitations, exemplified through a case study on the LLaVA model.

AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent

AutoWebGLM is an advanced web navigating agent powered by large language models (LLMs). Existing agents often fall short on real-world webpages due to action versatility, HTML text complexity, and decision-making challenges. AutoWebGLM addresses these by simplifying HTML, leveraging human-AI hybrid data for training, and employing reinforcement learning. It outperforms GPT-4 and is evaluated on diverse web navigation benchmarks. Code, model, and data will be available on GitHub.

🛠️TOP TOOLS

Bezi AI - Design 3D apps and games faster than ever before

Sendspark - AI Personalized Sales Videos

Ai Agents Challenge - Create your Agent, Train it, Empower it, Compete!

Creativ AI - Create amazing blog entries, attention-grabbing advertisements, interesting emails, and super cool social media posts in just a few seconds.

Shine by Sunshine - AI-powered Magic/Manual controls for photo upload

🗞️MORE NEWS

Google is working on a ‘lookup’ button for unknown callers on Android

Google plans to introduce a 'Lookup' button for unknown callers on Android, allowing users to easily search numbers from recent calls. Additionally, Gemini email summaries might come to Android Gmail. This feature, currently available on the web, could streamline email management. Both updates aim to enhance user experience. THE VERGE

Gretel releases world’s largest open source text-to-SQL dataset, empowering businesses to unlock AI’s potential

Gretel releases the world’s largest open-source Text-to-SQL dataset, enabling businesses to leverage AI effectively. This dataset, comprising over 100,000 meticulously crafted synthetic samples, facilitates AI model training for natural language queries to SQL. With rigorous quality validation and a focus on privacy, Gretel pioneers the synthetic data revolution, fostering innovation and accessibility in AI. VENTUREBEAT

Anyscale addresses critical vulnerability on Ray framework — but thousands were still exposed

Anyscale tackles critical ShadowRay vulnerability on Ray framework, exposing thousands to unauthorized access for seven months. Initially disputed, Anyscale now provides tooling to detect exposed ports. 'ShadowRay' threat compromises AI production workloads, cloud environments, and sensitive credentials, highlighting the need for secure AI development and data hygiene. VENTUREBEAT

China’s Huawei is challenging traditional weather forecasting again, this time with groundbreaking AI model Zhiji

China's Huawei unveils Zhiji, an AI weather forecasting model, as an evolution of Pangu-Weather. Zhiji provides precise regional forecasts with a 3km accuracy, a significant improvement from its predecessors. Recognized as China's top scientific innovation of 2023, this advancement challenges traditional weather forecasting methods. SCMP

Inside Big Tech's underground race to buy AI training data

Photobucket, once a dominant image-hosting site, is now exploring a new avenue for its vast photo and video archive: training generative AI models. CEO Ted Leonard revealed that the company is in talks with tech firms to license its 13 billion images and videos for AI training. Negotiations have included rates ranging from 5 cents to $1 per photo and over $1 per video. These discussions shed light on a growing data market where companies seek content for AI training, amid legal and ethical considerations regarding user privacy and consent. REUTERS

Google Books reportedly indexing bad AI-written works

Google Books is indexing poorly written AI-generated books, potentially affecting the accuracy of its language tracking tool, Ngram. 404Media discovered these low-quality works by searching for a common chatbot phrase. Such books, like "Bears, Bulls, and Wolves," lack originality and may skew language research results. Google aims to address this issue for future data updates. THE VERGE

Spire Global bets on AI to help improve weather forecasts, with boost from Nvidia

Spire Global partners with Nvidia to enhance weather forecasting using AI. CEO Platzer anticipates significant improvements, leveraging Nvidia's generative AI for quicker and more accurate predictions. Spire's satellite data combined with Nvidia's technology aims to revolutionize weather forecasts, crucial for various sectors reliant on weather data. CNBC

AI-generated Asians were briefly unavailable on Instagram

Meta's Instagram AI image generator experienced glitches, initially creating all characters as Asian regardless of prompts. After being notified, it returned an error message when attempting to generate Asian people. Meta remained silent on the issue, but the tool eventually resumed functioning, though still producing racially inaccurate results. THE VERGE