Contingency-Aware Exploration in Reinforcement Learning

 

Jongwook Choi*

University of Michigan

Yijie Guo*

University of Michigan

Marcin Moczulski*

Google Brain
 

Junhyuk Oh

DeepMind

Neal Wu

Google Brain

Mohammad Norouzi

Google Brain

Honglak Lee

Google Brain
University of Michigan
* Equal Contributions
In ICLR 2019

Overview

We investigate whether learning contingency-awareness and controllable aspects of an environment can lead to efficient exploration in reinforcement learning. We develop an attentive dynamics model (ADM) that discovers controllable elements of the observations, which can be trained in a self-supervised fashion using the agent's experience.

We combine an actor-critic algorithm with count-based exploration using the discovered contingent regions, achieving strong results in sparse-reward Atari games: for example, we report a state-of-the-art score of >11,000 points on Montezuma's Revenge without using expert demonstrations, explicit high-level information (e.g., RAM states), or resetting to arbitrary states.

Demo Video

As a supplementary material, we show examples of the agent playing eight hard-exploration Atari Games and the discovered contingent regions.

 

Montezuma's Revenge
Seaquest
Frostbite
PrivateEye
Qbert
Venture
Freeway
Hero