An agent that dreams in discrete categories, then beats Atari
DreamerV2 learns a world model inside its head—complete with imagined trajectories—and uses those daydreams to train a policy that outperforms top model-free agents on raw pixels.

What it does
DreamerV2 is a reinforcement-learning agent that learns an internal world model directly from high-dimensional images, then trains its policy and value networks entirely on imagined rollouts in that latent space. The implementation here is the official TensorFlow 2 release, packaged on PyPI and runnable on a single GPU.
The interesting bit
The world model compresses observations into compact states made of a deterministic path plus sampled categorical variables—essentially forcing the agent to dream in discrete symbols rather than continuous mush. It learns end-to-end via straight-through gradients, a neat trick where the gradient of the density is replaced by the gradient of the sample, letting gradients flow through the sampling operation.
Key highlights
- First world-model agent to reach human-level performance on the full Atari benchmark (55 games), with training curves included
- Outperforms Rainbow and IQN using the same experience and compute, according to the paper
pip install dreamerv2gives you a clean API; the code auto-detects discrete vs. continuous action spaces- Docker support with GPU passthrough for dependency-free runs
- Built-in debug config disables
tf.functionand shrinks batch size for line-by-line debugging
Caveats
- Pinned to TensorFlow 2.6.0, which is already aging; mixed precision can spit out “infinite gradient norms” that are documented but still alarming
- The README notes numerical instabilities are possible with mixed precision, with a fallback to
--precision 32
Verdict
Worth a look if you’re doing model-based RL research or need a strong baseline on vision-based control tasks. Skip it if you want something lightweight or already committed to JAX/PyTorch ecosystems.