AI benchmarks are broken. Here’s what we need instead.

Current AI benchmarks fail to accurately measure real-world AI capabilities and progress. Experts argue for new evaluation methods that better reflect practical performance and ethical considerations. Improving benchmarks is crucial for guiding AI development responsibly and effectively.

ArchiveCore AI

Signal trust

Single sourceEarly signal

PublishedTuesday, March 31, 2026 at 2:01 PMMar 31, 02:01 PM

FreshnessArchive

Story ID#25

Back to feed Original report

Original article excerpt

Server-side extracted preview paragraphs from the original source.

One-off tests don’t measure AI’s true impact. We’re better off shifting to more human-centered, context-specific methods.

For decades, artificial intelligence has been evaluated through the question of whether machines outperform humans. From chess to advanced math, from coding to essay writing, the performance of AI models and applications is tested against that of individual humans completing tasks.

Opening the briefing

AI benchmarks are broken. Here’s what we need instead.

Original article excerpt