← all repositories
ctallec/world-models

Teaching cars to drive in their dreams

A clean PyTorch reimplementation of the famous 2018 paper where an agent learns inside its own compressed world model before touching reality.

699 stars Python AgentsML Frameworks
world-models
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

What it does

This repo rebuilds Ha & Schmidhuber’s “World Models” paper in PyTorch: an agent that learns to drive by first training a neural network to hallucinate plausible futures, then optimizing a simple controller inside that dream. The actual environment (OpenAI’s CarRacing) only shows up for final validation.

The interesting bit

The three-part split is the trick: a VAE compresses pixels into a latent space, an MDN-RNN predicts how that latent space evolves, and a tiny linear controller learns to steer using only latents and RNN hidden states. The controller is trained with CMA-ES—evolutionary optimization, not backprop—because gradients don’t need to flow through the messy environment.

Key highlights

  • Three cleanly separated training scripts: trainvae.py, trainmdrnn.py, traincontroller.py
  • Uses a “brownian” random policy for data generation, which the authors claim beats naive white-noise sampling for rollout quality
  • Controller training automatically distributes across all visible GPUs via CUDA_VISIBLE_DEVICES
  • Resume-friendly: all scripts reload existing models from logdir unless you pass --noreload

Caveats

  • Headless servers need xvfb-run wrapper for controller training; omitting it fails silently (logs end up in logdir/tmp)
  • GPU memory usage during controller training is described as “heavy” with no specific numbers given
  • Requires pre-generating a dataset of random rollouts before any training can begin

Verdict

Worth a look if you’re studying model-based RL or need a readable, hackable baseline for World Models. Skip it if you want a batteries-included, one-command training pipeline—this is deliberately modular and hands-on.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.