Tencent-Hunyuan/MixGRPO
Tencent Hunyuan's research on improving Group Relative Policy Optimization for flow-matching diffusion models via mixed ODE-SDE sampling.

MixGRPO is a reinforcement learning training method designed to improve the efficiency of GRPO for flow-based generative models. It introduces a mixed ODE-SDE (Ordinary Differential Equation - Stochastic Differential Equation) sampling strategy to better balance exploration and exploitation during training. The approach targets diffusion models used in generative tasks, aiming to unlock more efficient policy optimization by jointly optimizing sampling trajectories and reward signals.