The Quiet Collapse of AI’s Six-Month Benchmark Cycle

For three years, every frontier model set a new state of the art within months. That cycle is slowing — and what comes next is more interesting.

By James Whitfield · May 27, 2026 · 6 min read

For most of 2022 through 2024, the AI benchmark curve was a reliable story: every major model release produced measurable, substantial improvements over its predecessor within six months. The curve is bending.

GPT-4.5 and GPT-5’s improvements over GPT-4 on standard benchmarks are real but smaller in percentage terms than the GPT-3.5-to-4 jump. Gemini 2.0’s MATH benchmark improvement over Gemini 1.5 Ultra is measured in single digits.

None of this means progress has stopped. It means progress is concentrating in different places: multimodal capability, context window length, inference efficiency, deployment stability.

What comes next is messier and more interesting than the benchmark cycle: real-world task performance, agentic reliability, domain-specific specialisation. That’s a slower story to tell. It’s also a more important one.

// Author

James Whitfield

James has been taking apart computers since he was nine. He covers the silicon that makes everything else possible, from fab geopolitics to the GPUs sitting in your rig. Based in London.

The Quiet Collapse of AI’s Six-Month Benchmark Cycle

Leave a Reply Cancel reply

—

—

—