Teaching cars to drive in their dreams
A clean PyTorch reimplementation of the famous 2018 paper where an agent learns inside its own compressed world model before touching reality.

What it does
This repo rebuilds Ha & Schmidhuber’s “World Models” paper in PyTorch: an agent that learns to drive by first training a neural network to hallucinate plausible futures, then optimizing a simple controller inside that dream. The actual environment (OpenAI’s CarRacing) only shows up for final validation.
The interesting bit
The three-part split is the trick: a VAE compresses pixels into a latent space, an MDN-RNN predicts how that latent space evolves, and a tiny linear controller learns to steer using only latents and RNN hidden states. The controller is trained with CMA-ES—evolutionary optimization, not backprop—because gradients don’t need to flow through the messy environment.
Key highlights
- Three cleanly separated training scripts:
trainvae.py,trainmdrnn.py,traincontroller.py - Uses a “brownian” random policy for data generation, which the authors claim beats naive white-noise sampling for rollout quality
- Controller training automatically distributes across all visible GPUs via
CUDA_VISIBLE_DEVICES - Resume-friendly: all scripts reload existing models from
logdirunless you pass--noreload
Caveats
- Headless servers need
xvfb-runwrapper for controller training; omitting it fails silently (logs end up inlogdir/tmp) - GPU memory usage during controller training is described as “heavy” with no specific numbers given
- Requires pre-generating a dataset of random rollouts before any training can begin
Verdict
Worth a look if you’re studying model-based RL or need a readable, hackable baseline for World Models. Skip it if you want a batteries-included, one-command training pipeline—this is deliberately modular and hands-on.