Contrastive learning meets pixels: RL that learns what matters
CURL trains reinforcement agents from raw images without drowning in millions of environment steps.

What it does
CURL is a reinforcement learning method that combines contrastive unsupervised learning with Soft Actor-Critic (SAC) to train agents directly from high-dimensional image observations. It learns useful visual representations by asking “which augmented views of the same image belong together?” — then uses those representations for control. The repo implements the DeepMind control suite experiments; Atari results live in a separate codebase.
The interesting bit
The trick is decoupling representation learning from policy learning. While the RL head stumbles around exploring, a contrastive encoder (inspired by CPC) learns a compact latent space where semantically similar observations cluster together. This matters because raw pixels are noisy and high-dimensional; CURL essentially builds its own compression layer tailored to what the task actually needs.
Key highlights
- Built on top of SAC+AE by Denis Yarats, with a contrastive encoder bolted on
- Trains on
cartpole swingupfrom pixels to near-optimal (~845 pts) in about an hour on a GPU - Supports standard DeepMind control tasks; hyperparameters exposed via CLI flags in
train.py - TensorBoard logging and optional model/video saving
- GPU-accelerated rendering via EGL for headless training
Caveats
- Atari experiments are not in this repo — you’ll need the separate
curl_rainbowcodebase - The README is sparse on architecture details; you’ll need to read the paper or dig into
train.pyfor encoder specifics - Only conda environment provided; no pip requirements.txt
Verdict
Worth a look if you’re doing pixel-based RL and burning samples faster than you can collect them. Skip if you need Atari results, discrete action spaces, or a plug-and-play library — this is research code that expects you to get your hands dirty.