OpenAI and Anthropic share findings from a joint safety evaluation

OpenAI and Anthropic have collaborated on a joint safety evaluation of AI systems. They shared their findings to improve understanding of AI risks and mitigation strategies. This collaboration helps advance safer AI development practices across the industry.

ArchiveMarket

Signal trust

Single sourceEarly signal

PublishedWednesday, August 27, 2025 at 12:00 PMAug 27, 12:00 PM

FreshnessArchive

Story ID#310

Back to feed Original report

Original article excerpt

Server-side extracted preview paragraphs from the original source.

OpenAI and Anthropic share findings from a first-of-its-kind joint safety evaluation, testing each other’s models for misalignment, instruction following, hallucinations, jailbreaking, and more—highlighting progress, challenges, and the value of cross-lab collaboration.

This summer, OpenAI and Anthropic collaborated on a first-of-its-kind joint evaluation: we each ran our internal safety and misalignment evaluations on the other’s publicly released models and are now sharing the results publicly. We believe this approach supports accountable and transparent evaluation, helping to ensure that each lab’s models continue to be tested against new and challenging scenarios. We’ve since launched GPT‑5⁠, which shows substantial improvements in areas like sycophancy, hallucination, and misuse resistance, showing the benefits of reasoning-based safety techniques. The goal of this external evaluation is to help surface gaps that might otherwise be missed, deepen our understanding of potential misalignment, and demonstrate how labs can collaborate on issues of safety and alignment.

Opening the briefing

OpenAI and Anthropic share findings from a joint safety evaluation

Original article excerpt