Reinforcement learning with prediction-based rewards

OpenAI introduced a reinforcement learning method using prediction-based rewards to improve agent performance. This approach helps agents learn more effectively by predicting future states and receiving rewards accordingly. It matters because it advances the efficiency and capability of AI learning systems.

ArchiveLaunch

Signal trust

Single sourceEarly signal

PublishedWednesday, October 31, 2018 at 8:00 AMOct 31, 08:00 AM

FreshnessArchive

Story ID#821

Back to feed Original report

Original article excerpt

Server-side extracted preview paragraphs from the original source.

We’ve developed Random Network Distillation (RND), a prediction-based method for encouraging reinforcement learning agents to explore their environments through curiosity, which for the first time exceeds average human performance on Montezuma’s Revenge.

We’ve developed Random Network Distillation (RND)⁠, a prediction-based method for encouraging reinforcement learning agents to explore their environments through curiosity, which for the first time exceeds average human performance on Montezuma’s Revenge⁠(opens in a new window).

Opening the briefing

Reinforcement learning with prediction-based rewards

Original article excerpt