Original article excerpt
Server-side extracted preview paragraphs from the original source.
We share our AI model’s proof attempts for the First Proof math challenge, testing research-grade reasoning on expert-level problems.
We’re sharing our proof attempts for First Proof, a math challenge testing if AI can produce checkable proofs on domain-specific problems.
We ran an internal model on all 10 First Proof(opens in a new window) problems, a research-level math challenge designed to test whether AI systems can produce correct, checkable proof attempts. Unlike short-answer or competition-style math, these problems require building end-to-end arguments in specialized domains, and correctness is hard to establish without expert review. The authors of the First Proof problems are leading experts in their respective fields, and at least a couple of the problems were open for years before the authors found solutions. An academic department that has substantial overlap with the subject areas could conceivably solve many of the problems in one week.