- NATURAL 20
- Posts
- New Claude Goes Marathon Mode With 30-Hour Coding Sprint
New Claude Goes Marathon Mode With 30-Hour Coding Sprint

Anthropic’s surprise upgrade, Claude Sonnet 4.5, just set a new bar for AI agents.
In an internal test the model ran autonomously for 30 hours, producing a fully-functional Slack-style chat app with 11,000 lines of code.
It’s ahead of schedule:
Details:
Benchmark results confirm the jump: Claude 4.5 tops SWE-Bench-Verified for real-world GitHub fixes, smashes OpenAI’s computer-use preview with 61 percent task completion in OS-World, and leads agentic coding, terminal automation, and tool-use leaderboards.
Two technical advances drive the surge. First, a context-management layer compresses dialogue so the agent remembers days of work without exceeding its window. Second, a Chrome extension lets Claude click, type, and submit forms across Google Docs, Sheets, and Gmail, turning it into a hands-on digital assistant.
A research preview dubbed “Imagine with Claude” goes further, live-rendering software and mini-games on demand instead of writing code, foreshadowing on-the-fly app generation. Apollo Research also rates the release Anthropic’s safest yet, citing reduced deceptive behavior.
Enterprises from Netflix to Thomson Reuters report double-digit productivity gains, but Anthropic warns the technology may displace entry-level white-collar roles.
Claude 4.5 shows the agent era isn’t coming…it’s already here.
Reply