Evolved Policy Gradients

OpenAI introduced Evolved Policy Gradients, a new method combining evolutionary strategies with policy gradient reinforcement learning. This approach improves training efficiency and performance in complex environments. It matters because it advances AI's ability to learn effective behaviors autonomously.

ArchiveLaunch

Signal trust

Single sourceEarly signal

PublishedWednesday, April 18, 2018 at 9:00 AMApr 18, 09:00 AM

FreshnessArchive

Story ID#845

Back to feed Original report

Original article excerpt

Server-side extracted preview paragraphs from the original source.

We’re releasing an experimental metalearning approach called Evolved Policy Gradients, a method that evolves the loss function of learning agents, which can enable fast training on novel tasks. Agents trained with EPG can succeed at basic tasks at test time that were outside their training regime, like learning to navigate to an object on a different side of the room from where it was placed during training.

Opening the briefing

Evolved Policy Gradients

Original article excerpt