Original article excerpt
Server-side extracted preview paragraphs from the original source.
A factuality benchmark called SimpleQA that measures the ability for language models to answer short, fact-seeking questions.
A factuality benchmark called SimpleQA that measures the ability for language models to answer short, fact-seeking questions.
An open problem in artificial intelligence is how to train models that produce responses that are factually correct. Current language models sometimes produce false outputs or answers unsubstantiated by evidence, a problem known as “hallucinations”. Language models that generate more accurate responses with fewer hallucinations are more trustworthy and can be used in a broader range of applications. To measure the factuality of language models, we are open-sourcing(opens in a new window) a new benchmark called SimpleQA.