Original article excerpt
Server-side extracted preview paragraphs from the original source.
OpenAI helps build shared standards for advanced AI, supporting evaluation frameworks, safety practices, and global cooperation through the Appia Foundation.
Increasingly capable models can strengthen cyber defense, accelerate scientific discovery, and expand access to expertise. But they can also create safety and security risks if their capabilities are misunderstood, their safeguards are inadequate, or governments lack the information they need to respond. To realize the benefits safely and confidently, societies will need institutions with the technical and governance capacity to evaluate, secure, and govern increasingly capable systems.
That is one reason OpenAI helped found the Appia Foundation(opens in a new window), hosted by the Linux Foundation. Appia will develop open, modular specifications intended to translate international standards and established frameworks into practical assessment criteria across the AI value chain. Its work can help develop a critical missing trust layer by which third parties check conformity with standards, producing clearer and more reusable evidence when models, infrastructure, and applications are developed by different organizations. In doing this work, Appia will help create a shared technical language that will allow national and international institutions to trust each other’s work.
We see this effort as an important next step in a broader body of work to strengthen the institutions, standards, and assessment practices needed for advanced AI systems.
Our recent blueprint for democratic governance of frontier AI offers a roadmap for that work. It calls for a durable U.S. framework, a strengthened Center for AI Standards and Innovation (CAISI), and a broader resilience strategy across government. It also recognizes that frontier risks are international in scope. Nations should work together to develop compatible safety frameworks, trusted channels for sharing risk findings, and coordinated responses to incidents.
National capacity and international cooperation should reinforce one another. Strong institutions such as CAISI can develop technical expertise, evaluate frontier systems, and support an independent assessment ecosystem. A network of capable national institutions can then establish shared methods, recognize trusted evidence, and give governments the common technical understanding needed to act together.
Standards are central to that effort, and they must be grounded in credible evaluation practice and technical rigor. In our shared playbook for trustworthy third-party evaluations, we set out what frontier assessments increasingly need to disclose: the system tested, its tool access and evaluation harness, the methods used to elicit capabilities, the resources available, and the checks performed to validate the results. We have also put these principles into practice through testing partnerships with US CAISI and UK AISI, whose work on frontier capability assessments and biological-misuse safeguards led to concrete improvements in our systems. This work serves an important function to create the foundation for practices that can be standardized to check performance in a comparable way.
