FareedKhan-dev/train-deepseek-r1
A Jupyter notebook and guide walking through the step-by-step implementation of DeepSeek R1's training process using GRPO reinforcement learning.

Velocity · 7d
+1.6
★ / day
Trend
→steady
star history
The repository provides a hands-on implementation of DeepSeek R1’s training methodology, covering reinforcement learning fundamentals, the GRPO algorithm, reward functions for accuracy and format validation, and policy model setup. It includes explanatory markdown documents with hand-drawn diagrams to help non-technical audiences understand LLM training concepts.