← all repositories

lsdefine/simple_GRPO

A minimal GRPO implementation for training LLMs with reinforcement learning to achieve r1-style reasoning capabilities.

1.7k stars Python ML FrameworksLanguage Models
simple_GRPO
Velocity · 7d
+3.5
★ / day
Trend
steady
star history

This repository provides a simple implementation of Group Relative Policy Optimization (GRPO) for training large language models to exhibit reasoning behaviors similar to r1-style models. It includes support for vLLM inference acceleration, split reference models across GPUs, and memory-efficient training on single A800 GPUs. The codebase is designed for educational purposes and experimentation with RL training pipelines for LLMs.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.