A 1,166-star gravestone to early TensorFlow RL
This repo's own author killed it in favor of OpenAI Baselines—here's what remains.

What it does
An early (circa 2015-2016) Deep Q-Learning implementation in TensorFlow, built around a simple modular framework: controllers pick actions, simulators run environments, and a simulate() function glues them together. Includes a “Karpathy game” demo where a neural net learns to chase green dots, avoid red ones, and flee orange penalties.
The interesting bit The author—later at OpenAI—left this up as a historical artifact with a blunt “now obsolete” banner and a link to Baselines. The human controller via Redis is a charmingly over-engineered touch: you can literally SSH into your own RL agent.
Key highlights
- Modular
tf_rldesign: swap controllers (DeepQ, human, your own) or simulators via clean interfaces store()+training_step()pattern: explicit transition logging with per-step training (the docs warn “should not take too long”)- Built-in GIF generation pipeline via Inkscape frames
- Human controller requires local Redis server for real-time input
- 1,166 stars despite the author actively telling people to leave
Caveats
- Explicitly abandoned; author redirects to OpenAI Baselines
- Dependencies are pinned to ancient versions (
future==0.15.2,euclid==0.1) - No topics, no recent commits, no community activity visible
Verdict Worth a quick scroll if you’re writing a history of Deep Q-Learning implementations or want to see how early TensorFlow RL code was structured. Skip it entirely if you actually need to train an agent today—Baselines, Stable-Baselines3, or CleanRL will save you hours of dependency archaeology.