Reinforcement learning's greatest hits, glued together with Keras
A tidy reference implementation of five major deep RL algorithms, pinned to a very specific Keras version from 2018.

What it does
This repo packages A2C, A3C, DDPG, DDQN, and Dueling DDQN into a single main.py CLI. Each algorithm lives in its own folder with separate actor and critic networks, and you toggle between them with --type. It targets classic control tasks (CartPole, LunarLander) and Atari Breakout, with TensorBoard logging and a load_and_run.py script for replaying trained models.
The interesting bit
The value is in the side-by-side structure, not novelty. You can flip from asynchronous multi-threaded policy gradients to dueling double Q-networks with prioritized replay by changing one argument. The README also documents the actual math tricks—N-step returns, parameter-space noise for exploration, SumTree sampling for PER—which makes it useful for cross-referencing against the original papers.
Key highlights
- Five algorithms, one entry point:
python3 main.py --type {A2C,A3C,DDQN,DDPG} --env {your_env} - DDPG uses parameter-space noise instead of the usual action-space Gaussian exploration
- DDQN supports both Prioritized Experience Replay (
--with_PER) and dueling architectures (--dueling) as modular add-ons - Includes Atari preprocessing wrappers and frame stacking via
--consecutive_frames - Trained models auto-save; TensorBoard logs per-environment folders for live monitoring
Caveats
- Hard-locked to Keras 2.1.6 (released May 2018); modern TF/Keras will likely break things
- A3C is CPU-threaded, not GPU-distributed—fine for CartPole, painful for serious Atari training
- Plotting requires Plotly with a free API key, which feels like an unnecessary friction point in 2024
Verdict
Good if you want to trace how classic deep RL papers map to Keras code, or need a teaching scaffold. Skip it if you need production-grade RL—look at Stable-Baselines3 or CleanRL instead.