← all repositories

Tencent-Hunyuan/MixGRPO

Tencent Hunyuan's research on improving Group Relative Policy Optimization for flow-matching diffusion models via mixed ODE-SDE sampling.

MixGRPO
Velocity · 7d
+3.6
★ / day
Trend
steady
star history

MixGRPO is a reinforcement learning training method designed to improve the efficiency of GRPO for flow-based generative models. It introduces a mixed ODE-SDE (Ordinary Differential Equation - Stochastic Differential Equation) sampling strategy to better balance exploration and exploitation during training. The approach targets diffusion models used in generative tasks, aiming to unlock more efficient policy optimization by jointly optimizing sampling trajectories and reward signals.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.