Finding GPT-4’s mistakes with GPT-4

OpenAI used GPT-4 itself to identify and analyze its own mistakes. This approach helps improve the model's accuracy and reliability. Understanding errors is crucial for advancing AI safety and performance.

ArchiveMajor

Signal trust

Single sourceEarly signal

PublishedThursday, June 27, 2024 at 12:00 PMJun 27, 12:00 PM

FreshnessArchive

Story ID#560

Back to feed Original report

Original article excerpt

Server-side extracted preview paragraphs from the original source.

CriticGPT, a model based on GPT-4, writes critiques of ChatGPT responses to help human trainers spot mistakes during RLHF

CriticGPT, a model based on GPT‑4, writes critiques of ChatGPT responses to help human trainers spot mistakes during RLHF

We've trained a model, based on GPT‑4, called CriticGPT to catch errors in ChatGPT's code output. We found that when people get help from CriticGPT to review ChatGPT code they outperform those without help 60% of the time. We are beginning the work to integrate CriticGPT‑like models into our RLHF labeling pipeline, providing our trainers with explicit AI assistance. This is a step towards being able to evaluate outputs from advanced AI systems that can be difficult for people to rate without better tools.

Opening the briefing

Finding GPT-4’s mistakes with GPT-4

Original article excerpt