- NATURAL 20
- Posts
- AI’s Autonomous Capabilities Today
AI’s Autonomous Capabilities Today
PLUS: GitHub Copilot for Project Starts, Activists Challenge OpenAI on GDPR Breach and more.

Today:
- AI’s Autonomous Capabilities Today 
- Financial Times Partners with OpenAI 
- DeepMind Unveils Gecko 
- GitHub Copilot for Project Starts 
- Activists Challenge OpenAI on GDPR Breach 
- Meta’s Ad Tool Causes Cash Bleed 
STUNNING Step for Autonomous AI Agents PLUS OpenAI Defense Against JAILBROKEN Agents
The rapid advancement of AI technology, particularly in large language models and AI agents, promises significant changes in how we interact with computers. Currently, advancements in AI's ability to reason, perceive, and execute computer-based tasks have progressed markedly. For example, AI can now perform various tasks that were typically done by humans, such as coding, data entry, and even creative tasks like writing, all by mimicking human interactions with computers.
New benchmarks like OSWorld are being developed to evaluate AI agents' effectiveness across different operating systems like Linux, Microsoft, and Apple. These agents are not just theoretical but are becoming more capable of performing complex tasks autonomously, indicating a shift towards more integrated and potentially transformative uses of AI in everyday computing.
Financial Times announces strategic partnership with OpenAI
The Financial Times has partnered with OpenAI, a leader in AI research, to enhance ChatGPT with FT content and develop new AI products. Users can expect attributed summaries and links to FT journalism in ChatGPT responses. FT also adopted ChatGPT Enterprise for its employees.
CEO John Ridding sees this as pivotal for understanding AI's impact on content access. OpenAI's COO, Brad Lightcap, highlights the goal of empowering news organizations and enriching ChatGPT with quality journalism. The FT aims to navigate AI advancements while safeguarding content integrity.
Google’s DeepMind creates ‘Gecko’, a rigorous new standard for testing AI image generators

Google DeepMind introduces "Gecko," a rigorous benchmark for evaluating text-to-image AI models. Current evaluation methods lack comprehensiveness and reliability, prompting the need for Gecko. It assesses models across 2,000 text prompts, categorizing skills and complexity levels. Gecko gathers extensive human ratings to discern model limitations, ambiguous prompts, and evaluation inconsistencies.
Additionally, it features an enhanced automatic evaluation metric aligned with human judgments. DeepMind's Muse model excels in Gecko's evaluation. The researchers stress the importance of diverse benchmarks for accurate understanding and deployment of text-to-image AI. Gecko's code and data will be freely available to facilitate further advancements.
GitHub Copilot can now help start a project with AI, not just complete it
GitHub introduces GitHub Copilot Workspace, extending its AI-powered code completion platform to assist developers in starting new coding projects. This service aims to reduce the time spent reading through code and figuring out project beginnings. Integrated into GitHub repositories, Copilot Workspace allows developers to describe project goals, receive suggestions on project initiation, edit suggestions, and run the code.
GitHub Next head Jonathan Carter highlights Copilot Workspace's ability to review older code efficiently. The expansion of Copilot's capabilities underscores its significance as a benchmarking skill for new AI models.
Data Activists Target OpenAI In Challenge To ChatGPT’s ‘Hallucination’ Problem

Privacy activists from the Vienna-based nonprofit noyb have filed a complaint against OpenAI, alleging violations of Europe's General Data Protection Regulation (GDPR) due to the AI's tendency to produce incorrect information ("hallucinate"). They argue that OpenAI's inability to accurately filter false data from its outputs, such as incorrect birth dates for public figures, violates GDPR.
Noyb also criticized OpenAI for its lack of transparency regarding the data processed, its sources, and its dissemination. This legal action puts additional pressure on AI developers to comply with stringent EU privacy regulations, amidst ongoing concerns about AI-generated misinformation and its potential consequences.
Meta’s ‘set it and forget it’ AI ad tools are misfiring and blowing through cash
Meta's automated ad platform, Advantage Plus, has been malfunctioning, causing advertisers to experience significantly higher costs and poor performance. Since its malfunction began on Valentine's Day, the system has been consuming daily budgets rapidly, with costs per impressions (CPMs) increasing tenfold, yet yielding almost no revenue. This issue has led to substantial financial losses for small businesses using the service, and many are considering abandoning Meta's platforms due to the lack of transparency and accountability from the company.
Despite Meta's claims that Advantage Plus is a "set it and forget it" solution, the reality has been inconsistent and problematic, forcing some advertisers to revert to manual ad purchases. Meta has acknowledged a platform bug on February 14 but has not provided detailed explanations or effective solutions, raising frustrations among its users.
🧠RESEARCH
The new Pooling LLaVA model adapts existing image-language tech to videos, cutting down on the need for massive data and computing power. By adjusting how features change over time, it beats previous top scores in video understanding and answering, showing major improvements on several benchmarks.
The AdvPrompter speeds up the creation of adversarial prompts for Large Language Models (LLMs), making it super quick to test LLM vulnerabilities. Unlike slow, traditional methods, it uses a cool new approach to generate prompts that trick LLMs without needing deep access to their internals. This tool not only shows top results on test datasets but also helps make LLMs tougher against attacks.
The MaPa project brings a fresh approach to creating photorealistic materials for 3D models using text descriptions. It ditches the need for massive matched datasets and instead uses a 2D diffusion model to make material graphs for each part of a model. These graphs are then tweaked to match the text, resulting in highly realistic and editable materials that outdo previous methods.
HaLo-NeRF introduces a way to navigate and understand big tourist spots through photos, using a mix of language and visual cues. It bridges the gap between simple 3D mapping and deep semantic understanding by linking scene images to descriptive text. This method improves recognition of landmarks by aligning visual data with relevant internet text, enhancing both 3D segmentation and semantic accuracy over existing models.
NeRF-XL revolutionizes the use of Neural Radiance Fields (NeRFs) by enabling their distribution across multiple GPUs, which allows for handling much larger models and scenes. This new method overcomes limitations of previous multi-GPU strategies that struggled with scene quality, offering a scalable approach that maintains the integrity of a single-GPU setup while reducing cross-GPU communication. It has proven to enhance both the speed and quality of 3D reconstructions on extensive datasets, showcasing its potential with impressive results on the massive MatrixCity dataset.
🛠️TOP TOOLS
Idefics2 - A Powerful 8B Vision-Language Model for the community
Collectif - Where teams discover product insights in Minutes, not Days
Faune - Brings the magic of AI models like Mistral and GPT-4 directly to you for free, prioritizing your privacy every step of the way.
Jack AI - AI powered marketing tools helps you write and edit high quality marketing content.
Browse AI - Simplifies data extraction from multiple sources by allowing businesses to train no-code scraping robots
📲SOCIAL MEDIA
Memory is now available to all ChatGPT Plus users. Using Memory is easy: just start a new chat and tell ChatGPT anything you’d like it to remember.
Memory can be turned on or off in settings and is not currently available in Europe or Korea. Team, Enterprise, and GPTs to come.
— OpenAI (@OpenAI)
5:07 PM • Apr 29, 2024
🗞️MORE NEWS
Copilot Workspace is GitHub’s take on AI-powered software engineering
GitHub's Copilot Workspace is leveling up AI-powered software development tools, enabling devs to craft code using natural language. This new tool aims to ease the starting process and assist throughout code building, particularly enhancing GitHub's current offerings. It's still in technical preview, exploring ways to blend AI with human creativity more efficiently. TECHCRUNCH
Google Releases Major New Feature Boost To Older Android Phones
Google's rolling out its AI app, Gemini, to older Android phones—Android 10 and 11—expanding accessibility to millions. This update, initially for newer models, marks a significant step in broadening AI tech adoption across various devices. As Google gears up for the I/O Conference, expect more insights into Gemini and future AI integrations. FORBES
These Apple apps could get a major AI overhaul in iOS 18
iOS 18 is shaping up to be a monumental update for Apple, potentially featuring major AI improvements across several key apps like Notes, Mail, Photos, and Fitness. The update could include a more intuitive Siri, advanced mapping features in Apple Maps, and a smarter Notes app capable of performing mathematical calculations. As anticipation builds, the full extent of these upgrades is expected to be unveiled at Apple's Worldwide Developers Conference on June 10. TECHRADAR
‘AI death calculator’ creators issue urgent warning about frighteningly accurate tool
The creators of Life2vec, a precise AI-based death prediction tool, have issued a warning about fraudulent apps exploiting their technology to steal personal information. Developed by US and Danish scientists, Life2vec uses factors like income, profession, and health history to predict mortality. However, numerous unauthorized apps, dubbed "de-terminators," have surfaced, mimicking Life2vec and potentially harvesting sensitive user data. The original creators emphasize that these counterfeit apps are not associated with their genuine, research-focused product. THE NEW YORK POST
Image-generating AI creates uncanny optical illusions
AI models can now create optical illusions from text descriptions, adjusting to produce images that change appearance based on viewing angle or motion. Researchers at the University of Michigan developed these illusions by tweaking existing AI technology, creating visuals that morph when seen from different perspectives. NEWSCIENTIST
AI-enabled electrocardiography alert intervention and all-cause mortality: a pragmatic randomized clinical trial
In a comprehensive study, an AI-enabled electrocardiogram (ECG) significantly reduced all-cause mortality within 90 days by identifying high-risk hospitalized patients. The multisite trial involved 39 physicians and 15,965 patients, showing that the AI-driven intervention could prompt timely and intensive care, especially for those identified with high-risk ECGs, leading to a marked decrease in cardiac deaths. NATURE
Researchers develop malicious AI ‘worm’ targeting generative AI systems
Researchers have developed a new malware named "Morris II," designed to exploit generative AI systems like OpenAI's ChatGPT and Google's Gemini. This malware, which resembles the infamous 1988 Morris worm, uses self-replicating prompts to spread, steal data, and generate spam through AI-powered platforms. Demonstrated in a controlled environment, it reveals significant vulnerabilities in AI systems, underscoring the urgent need for enhanced AI cybersecurity measures. SECURITY INTELLIGENCE
Meditron: An LLM suite especially suited for low-resource medical settings leveraging Meta Llama
Meditron, developed using Meta Llama 2, is a suite of large multimodal foundation models tailored for the medical field, enhancing clinical decision-making and diagnosis. Built collaboratively with Yale School of Medicine and EPFL, it has quickly adapted to incorporate Meta Llama 3, significantly improving its performance on medical benchmarks. Aimed at low-resource settings, Meditron provides open-source, evidence-based medical information, helping fill crucial gaps in global healthcare access. The suite includes models of different scales, maintaining high quality while optimizing for cost and technical feasibility, and has engaged a global community in its ongoing evaluation and refinement. META
| What'd you think of today's edition? | 
| What are MOST interested in learning about AI?What stories or resources will be most interesting for you to hear about? | 

Reply