Original article excerpt
Server-side extracted preview paragraphs from the original source.
HealthBench is a new evaluation benchmark for AI in healthcare which evaluates models in realistic scenarios. Built with input from 250+ physicians, it aims to provide a shared standard for model performance and safety in health.
Improving human health will be one of the defining impacts of AGI. If developed and deployed effectively, large language models have the potential to expand access to health information, support clinicians in delivering high-quality care, and help people advocate for their health and that of their communities.