Contingency-Aware Exploration in Reinforcement Learning

Yijie Guo*

University of Michigan

Marcin Moczulski*

Google Brain

Neal Wu

Google Brain

Overview

We investigate whether learning contingency-awareness and controllable aspects of an environment can lead to efficient exploration in reinforcement learning. We develop an attentive dynamics model (ADM) that discovers controllable elements of the observations, which can be trained in a self-supervised fashion using the agent's experience.

We combine an actor-critic algorithm with count-based exploration using the discovered contingent regions, achieving strong results in sparse-reward Atari games: for example, we report a state-of-the-art score of >11,000 points on Montezuma's Revenge without using expert demonstrations, explicit high-level information (e.g., RAM states), or resetting to arbitrary states.

Demo Video

As a supplementary material, we show examples of the agent playing eight hard-exploration Atari Games and the discovered contingent regions.

Contingency-Aware Exploration in Reinforcement Learning

Jongwook Choi*

University of Michigan

Yijie Guo*

University of Michigan

Marcin Moczulski*

Google Brain

Junhyuk Oh

DeepMind

Neal Wu

Google Brain

Mohammad Norouzi

Google Brain

Honglak Lee

Google Brain
University of Michigan

* Equal Contributions

In ICLR 2019

Overview

Demo Video

Montezuma's Revenge

Seaquest

Frostbite

PrivateEye

Qbert

Venture

Freeway

Hero

Contingency-Aware Exploration in Reinforcement Learning

Jongwook Choi*

University of Michigan

Yijie Guo*

University of Michigan

Marcin Moczulski*

Google Brain

Junhyuk Oh

DeepMind

Neal Wu

Google Brain

Mohammad Norouzi

Google Brain

Honglak Lee

Google BrainUniversity of Michigan

* Equal Contributions

In ICLR 2019

Overview

Demo Video

Montezuma's Revenge

Seaquest

Frostbite

PrivateEye

Qbert

Venture

Freeway

Hero

Google Brain
University of Michigan