alessiodm/drl-zh
An educational Jupyter Notebook course teaching deep reinforcement learning by building algorithms from scratch.

Velocity · 7d
+2.6
★ / day
Trend
→steady
star history
The course starts with MDPs and tabular RL, progressing to DQN, REINFORCE, actor-critic methods, DDPG, TD3, SAC, and PPO. Advanced notebooks cover RLHF with PPO, DPO, and GRPO for language models, Decision Transformers, and world models like Dreamer. Students write code in guided TODO sections with complete solutions provided.