Original article excerpt
Server-side extracted preview paragraphs from the original source.
CriticGPT, a model based on GPT-4, writes critiques of ChatGPT responses to help human trainers spot mistakes during RLHF
CriticGPT, a model based on GPT‑4, writes critiques of ChatGPT responses to help human trainers spot mistakes during RLHF
We've trained a model, based on GPT‑4, called CriticGPT to catch errors in ChatGPT's code output. We found that when people get help from CriticGPT to review ChatGPT code they outperform those without help 60% of the time. We are beginning the work to integrate CriticGPT‑like models into our RLHF labeling pipeline, providing our trainers with explicit AI assistance. This is a step towards being able to evaluate outputs from advanced AI systems that can be difficult for people to rate without better tools.