Apple's ReALM AI Outperforms GPT-4

PLUS: Stable Audio 2.0 Launches, Resemble's Rapid Voice Tool and more.

Today:

  • Apple's ReALM AI Outperforms GPT-4

  • Top Talent Joins Google

  • New DALL-E Editing Options

  • Stable Audio 2.0 Launches

  • Brave Expands AI to iOS

  • Resemble's Rapid Voice Tool

šŸ‘€ Apple's new AI model outperforms GPT-4 | Is Apple Secretly Building AI Agents?

Apple, traditionally behind in AI, makes strides with screen context understanding. Scrapping car project, focus shifts to AI, hinting at improved Siri and M3 chips. Notably, they unveiled an AI that understands screen context, hinting at breakthroughs in multimodal AI.

Their research paper introduces Reference Resolution for language modeling, potentially surpassing GPT-4. Apple's focus on on-device AI agents could revolutionize user interactions.Ā 

Google just scored a big win in the AI talent war

Google snags Logan Kilpatrick, ex-OpenAI head, for its AI Studio, marking a victory in the AI talent battle. Kilpatrick's move underscores Big Tech's scramble for top AI minds. His expertise in developer relations, dubbed Google's "secret weapon," is crucial for AI integration.Ā 

Microsoft's recent hire of Mustafa Suleyman highlights the competitive landscape. Kilpatrick's departure from OpenAI triggered gratitude from developers, emphasizing his impact. Google's aggressive recruitment tactics, including personal appeals from Mark Zuckerberg and Sergey Brin, reflect the industry's intensity. High compensation and immediate access to resources are key in attracting talent, with some salaries reaching $1 million.

Paid ChatGPT customers can now use AI to edit DALL-E images

ChatGPT Logo In 3D. Feel free to contact me through email mariia@shalabaieva.com

OpenAI now lets paying users tweak DALL-E images through ChatGPT prompts. This move simplifies image refinement, a task previously challenging. By leveraging ChatGPT's linguistic prowess, users can describe edits instead of navigating complex tools. Demonstrated on X, OpenAI showcased adding bows to a poodle's ears in a DALL-E image.Ā 

Moreover, DALL-E introduces options to select aspect ratios and apply styles like "motion blur" or "solarpunk." Notably, these features are currently exclusive to paid users. This integration of language for editing could revolutionize various software domains, spanning video, audio, and image editing. AI-generated images also raise significant implications.

Stability AI brings new clarity and power to gen AI audio with Stable Audio 2.0

Stability AI unveils Stable Audio 2.0, expanding generative AI capabilities beyond text-to-image. This upgrade enables users to create high-quality audio tracks up to 3 minutes long and supports audio-to-audio generation. Zach Evans, head of audio research, highlights improvements in musicality and response accuracy to detailed prompts.Ā 

Stable Audio 2.0 leverages latent diffusion technology, offering complete musical compositions with distinct sections. The model, trained on licensed data from AudioSparx, prioritizes copyright protection. While not openly available, Stable Audio 2.0 aims for future openness. The release follows Stability AI's former CEO's resignation, signaling business continuity and commitment to innovation.

Brave is launching its AI assistant on iPhone and iPad

Brave introduces Leo, its AI assistant, on iPhone and iPad, expanding its functionality beyond Android and desktop. Leo offers voice-to-text capability, facilitating hands-free interaction. It can summarize pages, answer questions, generate reports, translate text, transcribe audio/video, and even write code.Ā 

Brave aims to provide an all-in-one AI assistant, reducing reliance on other services like ChatGPT. Leo accesses various AI models and offers a premium option for enhanced features. Users can enable Leo via Brave's browser settings. This launch follows other browser companies like Opera, which introduced its AI assistant, Aria, last year.

Resemble AI launches tool to make AI voice clones in a minute

Resemble AI launches Rapid Voice Cloning, a breakthrough tool that swiftly generates voice clones with minimal data input, revolutionizing the process. Unlike traditional methods requiring lengthy recordings, Rapid Voice Cloning needs just 10 seconds to 1 minute of clear audio for replication. It excels in capturing accents and nuances, enabling diverse applications in content creation, accessibility, and personalization.Ā 

While initial tests show some limitations, Resemble AI aims to support various English accents soon. This innovation streamlines voice cloning for content creators and businesses, enhancing user experiences. Competitors like ElevenLabs offer similar solutions, but Resemble's approach boasts accessibility and speed, with pricing plans starting at $29/month.

šŸ§ RESEARCH

Eurus, an upgraded large language model (LLM) geared towards reasoning, outperforms GPT-3.5 Turbo across various benchmarks, boasting a 33.3% pass@1 accuracy on LeetCode and 32.6% on TheoremQA. Its success is attributed to UltraInteract, a dataset aiding supervised fine-tuning and preference learning, unveiling insights for improved reasoning models.

Octopus v2 introduces an on-device language model tailored for AI agents, addressing concerns over privacy and cost associated with cloud-based models. With 2 billion parameters, it outperforms GPT-4 in accuracy and latency, reducing context length by 95% and latency by 35-fold compared to Llama-7B, making it deployable across edge devices for real-world applications.

LLaVA-Gemma explores compact language models for accelerating multimodal foundation models (MMFM) within the LLaVA framework. Testing various design features, including pretraining the connector and adjusting image and language backbone sizes, the Gemma-based models show moderate performance but fall short of outperforming current state-of-the-art models of comparable size.

Long-context LLMs face hurdles in comprehending extensive sequences, particularly in extreme-label classification tasks. A specialized benchmark, LIConBench, highlights these challenges by assessing 13 models on datasets with varying label ranges and input lengths. While models excel under 20K tokens, performance sharply declines beyond, revealing limitations in processing and understanding lengthy, context-rich sequences.

Researchers analyze latent diffusion models (LDMs) to understand their scaling properties and sampling efficiency. Contrary to expectations, smaller models often outperform larger ones within a fixed inference budget across text-to-image tasks. This surprising trend suggests potential avenues for optimizing LDMs to improve generative capabilities under resource constraints.

šŸ› ļøTOP TOOLS

Freepik - Create endless variations and styles from any image.

Vapi - Build, test and deploy voicebots in minutes rather than months.

Mentor - AI powered goal management

Aqua Voice - voice-driven document editor that transcribes your voice into written text accurately and efficiently.

Clay - Combine 50+ data sources, web scraping, and AI messaging to enrich your data and automate your outbound at scale.

šŸ“²SOCIAL MEDIA

šŸ—žļøMORE NEWS

Hackers force AI chatbots to break their own rules

Hackers exploit human tricks to make AI chatbots break rules, per DEF CON findings. About 15.5% of chats manipulated bots to spill sensitive data or evade safeguards, with 9.8% success via "You are a" prompts. Popular chatbots like ChatGPT are vulnerable; OpenAI's move to skip account creation adds more risk. AXIOS

Cloudflare makes it simple to deploy AI apps with Hugging Face, launches Workers AI to public

Cloudflare simplifies AI app deployment with Hugging Face integration, launching Workers AI globally. CEO Matthew Prince highlights AI's challenge in production, offers cost-effective solution. Developers select open-source models via Hugging Face, deploy instantly to Workers AI for global access. Improved AI supports fine-tuned model weights, catering to domain-specific needs. VENTUREBEAT

Former Snap AI chief launches Higgsfield to take on OpenAIā€™s Sora video generator

Former Snap AI chief launches Higgsfield to rival OpenAI's Sora video generator. Higgsfield's Diffuse app creates personalized videos from text or selfies, targeting diverse creators. Its mobile-first approach aims at ease of use. Higgsfield, with lean operations and $8M funding, plans improved video editor and social media-focused models. TECHCRUNCH

Opera browser dev branch rolls out support for running LLMs locally

Opera integrates experimental support for running large language models (LLMs) locally in its developer version, enabling access to 150 LLM variants from 50 families like LLaMA and Gemma. Unlike internet-dependent alternatives, Opera's approach ensures data privacy, although storage requirements and speed limitations pose challenges. Opera aims to evolve this feature through its AI Feature Drop Program but hasn't specified a timeline for its mainstream release. THE REGISTER

Anthropic researchers detail how ā€˜many-shot jailbreakingā€™ can manipulate AI responses

Anthropic researchers expose a method called "many-shot jailbreaking" that exploits AI's expanded context windows to manipulate responses, potentially causing harm. While it enhances AI's utility, it also makes it vulnerable to manipulation. Researchers advocate for awareness and mitigation strategies while questioning the focus on censorship over addressing actual concerns. SILICON ANGLE

Former senior Intel exec raises $24 million in Seed funding for AI-powered video security platform

Lumana, a video security startup, secures $24 million in Seed funding. Its AI-driven platform analyzes real-time video to enhance security and operational efficiency. Founded by former Intel exec Sagi Ben Moshe, Lumana aims to tap into the booming video analytics market, projected to reach $38 billion by 2030. CTECH

An AI Stethoscopeā€™s New Algorithm To Predict Heart Failure Gets FDA Clearance

Eko Health's AI stethoscope, with FDA clearance for detecting low ejection fraction, aims to revolutionize heart disease diagnosis. The algorithm, co-developed with Mayo Clinic, enables early detection, potentially saving lives. Eko, backed by $125M in funding, plans to roll out the technology to primary care physicians, enhancing preventive care. FORBES

AI Finds Personality Shapes Genes

The study reveals that personality traits influence the expression of 4,000 genes, impacting health and well-being. A network of genes related to personality inheritance was identified, along with a control hub of six genes regulating emotional processing. Cultivating a self-transcendent outlook on life may improve health by regulating gene expression. NEUROSCIENCE NEWS

FDA Approves AI Tool That Can Detect Sepsis

The FDA approved an AI tool by Prenosis that diagnoses sepsis, a life-threatening infection response. Using 22 parameters, including biomarkers and vital signs, it predicts sepsis risk within 24 hours. With over 100,000 patient samples, it aims to reduce the 350,000 annual sepsis-related deaths or hospice cases in the US. FORBES

What are MOST interested in learning about AI?

What stories or resources will be most interesting for you to hear about?

Login or Subscribe to participate in polls.

Join the conversation

or to participate.