← all repositories

chi2liu/ABC-GRPO

ABC-GRPO is a reinforcement learning algorithm variant that introduces four independent clipping boundaries to improve stability and generalization when training LLMs like Qwen3 with GRPO.

441 stars Python ML FrameworksLanguage Models
ABC-GRPO
Velocity · 7d
+2.9
★ / day
Trend
steady
star history

The project implements Adaptive-Boundary-Clipping GRPO, an asymmetric refinement of the standard GRPO reinforcement learning algorithm for LLM training. It replaces GRPO’s two conditional clipping boundaries with four independent parameters (ε₁, ε₂, ε₃, ε₄) that provide unconditional bounds across all quadrants of the advantage space. The method maintains higher entropy during training to prevent premature convergence, and evaluation on mathematical reasoning tasks with Qwen3 models demonstrates superior performance over standard GRPO.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.