Reinforcement learning that daydreams its way to better policies
An agent that imagines futures in compressed feature space, then backpropagates through its own dreams to learn long-horizon control.

What it does Dreamer is a reinforcement learning agent that learns a world model to predict future states in a compact latent space. Instead of planning in raw pixels or high-dimensional observations, it imagines trajectories in this compressed feature space, then derives a policy and value function from those imagined sequences. The implementation here is a clean, TensorFlow 2 rewrite of the original research code.
The interesting bit The clever part is backpropagating value gradients through multi-step imagined predictions — essentially computing credit assignment across futures that never actually happened. This lets the agent learn long-horizon behaviors without the sample inefficiency of model-free methods that must experience every possibility firsthand.
Key highlights
- TensorFlow 2 implementation, positioned as “fast and simple” by the author
- Targets DeepMind Control Suite tasks (e.g.,
dmc_walker_walk) - Generates training visualizations and GIFs via TensorBoard
- Includes plotting utilities for analyzing learning curves
- Original paper by Hafner et al. (2019); this is a reimplementation, not the official Google Research codebase
Caveats
- Author notes DreamerV2 as the successor, with broader environment support (Atari + DMControl)
- Pinned to older TensorFlow 2.2.0 and specific dependency versions; may need attention to run on modern stacks
- README is minimal — no benchmark numbers, no training time estimates, no hardware requirements listed
Verdict Worth a look if you’re studying world models or need a readable TensorFlow 2 reference implementation of Dreamer. Skip if you want production-ready code or Atari support — the author explicitly points to DreamerV2 for that.