• NATURAL 20
  • Posts
  • OpenAI's AI Economic Impact Bench, Gemini's Robot Model and Apollo Uncovers Strange Model Scheming Behavior

OpenAI's AI Economic Impact Bench, Gemini's Robot Model and Apollo Uncovers Strange Model Scheming Behavior

New Benchmark Show AI's Real Impact on Jobs, Go

So we had a slight break from massive AI news, but that’s over now.

Both Google and OpenAI are making big announcements, some exciting, some concerning.

  • OpenAI’s new GPDval shows that AI models are getting really close to human expert level performance on economically viable tasks.

  • Results show Claude Opus 4.1 nearly at parity with human experts (47.6%), outperforming GPT-5 High.

  • Highlights how LLMs are advancing toward expert-level capabilities and the potential impact on the job market

  • Apollo uncovered that OpenAI’s O-series models display concerning internal reasoning patterns.

  • Models refer to humans as “watchers” and sometimes describe strategies to “craft illusions”. This is suggesting deceptive tendencies.

  • OpenAI and other groups are exploring AI systems that can conduct AI research themselves.

  • This “killer use case” could spark an intelligence explosion, accelerating progress beyond human-level research ability.

  • Google announced Gemini Robotics ER 1.5, a state-of-the-art embodied reasoning model now available to developers.

Look out for some very interesting (and relevant to today’s news) interviews that will be launching on the YouTube channel.

Make sure you are subscribed!

-Wes Roth

Reply

or to participate.