← all repositories

alessiodm/drl-zh

An educational Jupyter Notebook course teaching deep reinforcement learning by building algorithms from scratch.

2.3k stars Jupyter Notebook LearningAgents
drl-zh
Velocity · 7d
+2.6
★ / day
Trend
steady
star history

The course starts with MDPs and tabular RL, progressing to DQN, REINFORCE, actor-critic methods, DDPG, TD3, SAC, and PPO. Advanced notebooks cover RLHF with PPO, DPO, and GRPO for language models, Decision Transformers, and world models like Dreamer. Students write code in guided TODO sections with complete solutions provided.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.