Original article excerpt
Server-side extracted preview paragraphs from the original source.
We’re introducing a new model built on GPT-4o that is more accurate at detecting harmful text and images, enabling developers to build more robust moderation systems.
We’re introducing a new model built on GPT‑4o that is more accurate at detecting harmful text and images, enabling developers to build more robust moderation systems.
Today we are introducing a new moderation model, omni-moderation-latest, in the Moderation API(opens in a new window). Based on GPT‑4o, the new model supports both text and image inputs and is more accurate than our previous model, especially in non-English languages. Like the previous version, this model uses OpenAI's GPT‑based classifiers to assess whether content should be flagged across categories such as hate, violence, and self-harm, while also adding the ability to detect additional harm categories. Additionally, it provides more granular control over moderation decisions by calibrating probability scores to reflect the likelihood of content matching the detected category. The new moderation model is free to use for all developers through the Moderation API. Since we first launched the Moderation API in 2022, the volume and variety of content that automated moderation systems need to handle has increased, especially as more AI apps have reached massive scale in production. We hope today’s upgrades help more developers benefit from the latest research and investments in our safety systems.