• NATURAL 20
  • Posts
  • it evolved 100+ times with zero human input

it evolved 100+ times with zero human input

You probably haven't heard of MiniMax. Most people outside of AI research circles haven't.
That's about to change.

On March 17th, a Shanghai-based AI company released a model called M2.7 with a tagline that sounds like science fiction: "Early Echoes of Self-Evolution." But what they actually describe is not fiction. It is a documented, specific, measurable process where an AI model helped design and improve the tools used to build the next version of itself.

Let me break down what they did and why it matters.

Here are the links to all the relevant stuff:

MiniMax M2.7: Early Echoes of Self-Evolution https://www.minimax.io/news/minimax-m27-en

Here’s their “OPENROOM” the AI agent interface built mostly by the AI itself.

Who Is MiniMax

MiniMax is a Chinese AI company founded in 2022 by Yan Junjie, a former executive at SenseTime. They have 236 million users worldwide. In January 2026 they went public on the Hong Kong Stock Exchange, raising $619 million. Their shares doubled on the first day. The founder became a billionaire.

Their investors include Alibaba, Tencent, Hillhouse Capital, and the Abu Dhabi Investment Authority.

They are not a scrappy startup. They are a well-funded lab with a public stock listing and the resources to compete directly with OpenAI, Anthropic, and Google. On several benchmarks, their new model ranks second in the world, behind only Anthropic's best.
Most people in the West have never heard of them.

What Self-Evolution Actually Means

When MiniMax says M2.7 participated in its own evolution, they are not speaking metaphorically. Here is what they did, step by step.

They built a research agent using an early version of M2.7 and let it run their internal AI lab.

The agent became the daily research assistant for their reinforcement learning team. When a researcher had an experimental idea, they would describe it to the agent. The agent would do the literature review, track the experiment specifications, set up the data pipelines, launch the experiments, monitor progress, read the logs, debug the failures, fix the code, submit merge requests, and run smoke tests. Everything a team of engineers used to do, handled end to end.

Result: M2.7 was handling 30 to 50 percent of their RL team's entire workflow.

Then the agent started rewriting its own tools.

This is the part that gets interesting. The internal harness began autonomously collecting feedback on its own performance, building evaluation sets for internal tasks, and iterating on its own architecture, skills, and memory systems. The agent was rewriting the tools it uses to do its job.

Then they ran a controlled experiment with 100 rounds of autonomous optimization.
MiniMax had M2.7 optimize a model's programming performance on an internal scaffold, entirely without human input. The loop was:

Analyze failure trajectories. Plan changes. Modify scaffold code. Run evaluations. Compare results. Keep or revert. Repeat.

Over 100 rounds. Zero humans.

The discoveries M2.7 made on its own included finding optimal combinations of sampling parameters like temperature and frequency penalty, developing better workflow guidelines like automatically checking for the same bug pattern in other files after a fix, and adding loop detection and other architectural improvements.

The outcome was a 30 percent performance improvement on their internal evaluation sets.

Then they put it in a competition.

MiniMax gave M2.7 24 hours to compete in 22 machine learning competitions autonomously. The competitions were run on a single cheap GPU, a roughly $4,000 data center card. The agent used three modules: short-term memory, self-feedback, and self-optimization. After each round, it wrote down what it learned, criticized its own results, and fed that criticism into the next attempt.

Results across three trials: 9 gold medals, 5 silver, 1 bronze in the best run. Average medal rate of 66.6 percent.

That score ranks third in the world on this benchmark, behind Claude Opus 4.6 at 75.7 percent and GPT-5.4 at 71.2 percent. MiniMax ties Gemini 3.1.

On a single cheap GPU. Running autonomously for 24 hours.

How It Performs on Real Benchmarks

This is not a one-trick model. Across professional tasks, M2.7 is competing at the top of the industry.

On SWE-Pro, which tests real-world software engineering across multiple programming languages, M2.7 scores 56.22 percent. That matches GPT-5.3 Codex and approaches Claude Opus 4.6.

On end-to-end project delivery, where the model builds a full working application from a description, M2.7 scores 55.6 percent. Near Opus 4.6.

On professional office work, M2.7 ranks highest among open-source models on the GDPval-AA leaderboard with an ELO of 1491. Only GPT-5.4 at 1667, Claude Sonnet 4.6 at 1553, and Claude Opus 4.6 at 1606 beat it.

One specific finance example from MiniMax's blog: the model reads a company's annual reports and earnings call transcripts, cross-references multiple research reports, independently designs assumptions, builds a revenue forecast model, then produces a finished PowerPoint presentation and a written research report. Practitioners said the output "can already serve as a first draft and go directly into subsequent workflows."
The Artificial Analysis Intelligence Index score went from 42 on the previous model to 50 on M2.7. Up 8 points in a single release.

What This Is Really About

MiniMax is not just shipping a model. They are describing a new way of building AI companies.

They explicitly say that M2.7 is "significantly accelerating our own evolution into an AI-native organization." The model is part of their org chart. It runs research pipelines, catches production bugs, does literature reviews, builds evaluation datasets, and autonomously improves the tools it uses to do all of this.

This is what "AI-native" actually looks like in practice. Not as a branding term. As an operating model.

And they say this is just the beginning. Their direct quote: "We believe that future AI self-evolution will gradually transition towards full autonomy, coordinating data construction, model training, inference architecture, evaluation, and other stages without human involvement."

They are describing a future where the humans set the goals, and the model handles the rest of the research cycle. That future is apparently closer than most people think.

Why You Should Pay Attention

The AI conversation in the West tends to focus on OpenAI, Anthropic, and Google. Sometimes xAI or Meta.

MiniMax does not get mentioned.

But they went public in January while most Western AI labs are still private. They have hundreds of millions of users. Their model scores second or third in the world across multiple benchmarks. And now they are publishing documented methodology on AI self-improvement that is more concrete than anything the major Western labs have shared publicly on this topic.

The question their research raises is an important one. If a model can run 100+ rounds of autonomous optimization and improve its own performance by 30 percent, and then compete in machine learning competitions and rank second in the world, what does M3 look like when M2.7 helps build it?

We do not know yet. But MiniMax says they are going to find out.

Until next time,

Wes “still figuring out how to self-evolve” Roth

My full video about it:

Reply

or to participate.