AI safety via debate

OpenAI explores AI safety through a debate framework where AI systems argue different viewpoints. This method aims to improve transparency and alignment in AI decision-making. It matters because safer AI can better serve human interests and reduce risks.

ArchiveMajor

Signal trust

Single sourceEarly signal

PublishedThursday, May 3, 2018 at 9:00 AMMay 3, 09:00 AM

FreshnessArchive

Story ID#844

Back to feed Original report

Original article excerpt

Server-side extracted preview paragraphs from the original source.

We’re proposing an AI safety technique which trains agents to debate topics with one another, using a human to judge who wins.

We believe that this or a similar approach could eventually help us train AI systems to perform far more cognitively advanced tasks than humans are capable of, while remaining in line with human preferences. We’re going to outline this method together with preliminary proof-of-concept experiments and are also releasing a web interface so people can experiment with the technique.

Opening the briefing

AI safety via debate

Original article excerpt