Evaluating AI’s ability to perform scientific research tasks

OpenAI evaluated AI systems on their ability to perform scientific research tasks. The study highlights AI's potential to assist in complex problem-solving and data analysis in science. This progress could accelerate scientific discovery and innovation.

ArchiveMajor

Signal trust

Single sourceEarly signal

PublishedTuesday, December 16, 2025 at 10:00 AMDec 16, 10:00 AM

FreshnessArchive

Story ID#158

Back to feed Original report

Original article excerpt

Server-side extracted preview paragraphs from the original source.

OpenAI introduces FrontierScience, a benchmark testing AI reasoning in physics, chemistry, and biology to measure progress toward real scientific research.

We introduce FrontierScience, a new benchmark that evaluates AI capabilities for expert-level scientific reasoning across physics, chemistry, and biology.

Reasoning is at the core of scientific work. Beyond recalling facts, scientists generate hypotheses, test and refine them, and synthesize ideas across fields. As our models become more capable, the central question is how they can reason deeply to contribute to scientific research.

Opening the briefing

Evaluating AI’s ability to perform scientific research tasks

Original article excerpt