Original article excerpt
Server-side extracted preview paragraphs from the original source.
We’re proposing an AI safety technique which trains agents to debate topics with one another, using a human to judge who wins.
We’re proposing an AI safety technique which trains agents to debate topics with one another, using a human to judge who wins.
We believe that this or a similar approach could eventually help us train AI systems to perform far more cognitively advanced tasks than humans are capable of, while remaining in line with human preferences. We’re going to outline this method together with preliminary proof-of-concept experiments and are also releasing a web interface so people can experiment with the technique.