Introducing HealthBench

OpenAI has introduced HealthBench, a new benchmark designed to evaluate AI models on healthcare-related tasks. This benchmark aims to improve the accuracy and reliability of AI in medical applications. It matters because better AI evaluation can lead to safer and more effective healthcare solutions.

ArchiveLaunch

Signal trust

Single sourceEarly signal

PublishedMonday, May 12, 2025 at 12:30 PMMay 12, 12:30 PM

FreshnessArchive

Story ID#379

Back to feed Original report

Original article excerpt

Server-side extracted preview paragraphs from the original source.

HealthBench is a new evaluation benchmark for AI in healthcare which evaluates models in realistic scenarios. Built with input from 250+ physicians, it aims to provide a shared standard for model performance and safety in health.

Improving human health will be one of the defining impacts of AGI. If developed and deployed effectively, large language models have the potential to expand access to health information, support clinicians in delivering high-quality care, and help people advocate for their health and that of their communities.

Opening the briefing

Introducing HealthBench

Original article excerpt