← all repositories
germain-hug/Deep-RL-Keras

Reinforcement learning's greatest hits, glued together with Keras

A tidy reference implementation of five major deep RL algorithms, pinned to a very specific Keras version from 2018.

550 stars Python AgentsML Frameworks
Deep-RL-Keras
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

What it does

This repo packages A2C, A3C, DDPG, DDQN, and Dueling DDQN into a single main.py CLI. Each algorithm lives in its own folder with separate actor and critic networks, and you toggle between them with --type. It targets classic control tasks (CartPole, LunarLander) and Atari Breakout, with TensorBoard logging and a load_and_run.py script for replaying trained models.

The interesting bit

The value is in the side-by-side structure, not novelty. You can flip from asynchronous multi-threaded policy gradients to dueling double Q-networks with prioritized replay by changing one argument. The README also documents the actual math tricks—N-step returns, parameter-space noise for exploration, SumTree sampling for PER—which makes it useful for cross-referencing against the original papers.

Key highlights

  • Five algorithms, one entry point: python3 main.py --type {A2C,A3C,DDQN,DDPG} --env {your_env}
  • DDPG uses parameter-space noise instead of the usual action-space Gaussian exploration
  • DDQN supports both Prioritized Experience Replay (--with_PER) and dueling architectures (--dueling) as modular add-ons
  • Includes Atari preprocessing wrappers and frame stacking via --consecutive_frames
  • Trained models auto-save; TensorBoard logs per-environment folders for live monitoring

Caveats

  • Hard-locked to Keras 2.1.6 (released May 2018); modern TF/Keras will likely break things
  • A3C is CPU-threaded, not GPU-distributed—fine for CartPole, painful for serious Atari training
  • Plotting requires Plotly with a free API key, which feels like an unnecessary friction point in 2024

Verdict

Good if you want to trace how classic deep RL papers map to Keras code, or need a teaching scaffold. Skip it if you need production-grade RL—look at Stable-Baselines3 or CleanRL instead.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.