Faulty reward functions in the wild

OpenAI discusses issues with faulty reward functions in AI systems. These problems can cause unintended behaviors and reduce system reliability. Understanding and fixing reward functions is crucial for safer and more effective AI development.

ArchiveMajor

Signal trust

Single sourceEarly signal

PublishedWednesday, December 21, 2016 at 9:00 AMDec 21, 09:00 AM

FreshnessArchive

Story ID#908

Back to feed Original report

Original article excerpt

Server-side extracted preview paragraphs from the original source.

Reinforcement learning algorithms can break in surprising, counterintuitive ways. In this post we’ll explore one failure mode, which is where you misspecify your reward function.

At OpenAI, we’ve recently started using Universe⁠(opens in a new window), our software for measuring and training AI agents, to conduct new RL experiments. Sometimes these experiments illustrate some of the issues with RL as currently practiced. In the following example we’ll highlight what happens when a misspecified reward function encourages an RL agent to subvert its environment by prioritizing the acquisition of reward signals above other measures of success.

Opening the briefing

Faulty reward functions in the wild

Original article excerpt