Introducing gpt-oss-safeguard

OpenAI has introduced gpt-oss-safeguard, a new tool designed to enhance the safety of open-source GPT models. This safeguard aims to prevent misuse and ensure responsible deployment of AI technologies. It matters because it helps maintain ethical standards and trust in AI development.

ArchiveLaunch

Signal trust

Single sourceEarly signal

PublishedWednesday, October 29, 2025 at 1:00 AMOct 29, 01:00 AM

FreshnessArchive

Story ID#231

Back to feed Original report

Original article excerpt

Server-side extracted preview paragraphs from the original source.

OpenAI introduces gpt-oss-safeguard—open-weight reasoning models for safety classification that let developers apply and iterate on custom policies.

New open safety reasoning models (120b and 20b) that support custom safety policies.

Today, we’re releasing a research preview of gpt-oss-safeguard, our open-weight reasoning models for safety classification tasks, available in two sizes: gpt-oss-safeguard-120b and gpt-oss-safeguard-20b. These models are fine-tuned versions of our gpt-oss⁠ open models and available under the same permissive Apache 2.0 license, allowing anyone to use, modify, and deploy them freely. Both models can be downloaded today from Hugging Face⁠(opens in a new window).

Opening the briefing

Introducing gpt-oss-safeguard

Original article excerpt