Original article excerpt
Server-side extracted preview paragraphs from the original source.
The first run of our Retro Contest—exploring the development of algorithms that can generalize from previous experience—is now complete.
The first run of our Retro Contest—exploring the development of algorithms that can generalize from previous experience—is now complete.
Though many approaches were tried, top results all came from tuning or extending existing algorithms such as PPO and Rainbow. There’s a long way to go: top performance was 4,692 after training while the theoretical max is 10,000. These results provide validation that our Sonic benchmark is a good problem for the community to double down on: the winning solutions are general machine learning approaches rather than competition-specific hacks, suggesting that one can’t cheat through this problem.