← all repositories
danijar/dreamerv2

An agent that dreams in discrete categories, then beats Atari

DreamerV2 learns a world model inside its head—complete with imagined trajectories—and uses those daydreams to train a policy that outperforms top model-free agents on raw pixels.

1k stars Python AgentsML Frameworks
dreamerv2
Velocity · 7d
+0.5
★ / day
Trend
steady
star history

What it does

DreamerV2 is a reinforcement-learning agent that learns an internal world model directly from high-dimensional images, then trains its policy and value networks entirely on imagined rollouts in that latent space. The implementation here is the official TensorFlow 2 release, packaged on PyPI and runnable on a single GPU.

The interesting bit

The world model compresses observations into compact states made of a deterministic path plus sampled categorical variables—essentially forcing the agent to dream in discrete symbols rather than continuous mush. It learns end-to-end via straight-through gradients, a neat trick where the gradient of the density is replaced by the gradient of the sample, letting gradients flow through the sampling operation.

Key highlights

  • First world-model agent to reach human-level performance on the full Atari benchmark (55 games), with training curves included
  • Outperforms Rainbow and IQN using the same experience and compute, according to the paper
  • pip install dreamerv2 gives you a clean API; the code auto-detects discrete vs. continuous action spaces
  • Docker support with GPU passthrough for dependency-free runs
  • Built-in debug config disables tf.function and shrinks batch size for line-by-line debugging

Caveats

  • Pinned to TensorFlow 2.6.0, which is already aging; mixed precision can spit out “infinite gradient norms” that are documented but still alarming
  • The README notes numerical instabilities are possible with mixed precision, with a fallback to --precision 32

Verdict

Worth a look if you’re doing model-based RL research or need a strong baseline on vision-based control tasks. Skip it if you want something lightweight or already committed to JAX/PyTorch ecosystems.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.