Tencent-Hunyuan/SRPO
SRPO is a reinforcement learning method for aligning diffusion models with fine-grained human preference during training.

Velocity · 7d
+4.7
★ / day
Trend
→steady
star history
SRPO introduces a sampling strategy for diffusion fine-tuning that improves optimization stability and computational efficiency when aligning the full diffusion trajectory with human preference signals. It applies a novel direct alignment approach to restore highly noisy images during training, targeting improved image generation quality. The project provides code and trained models for the Flux diffusion model.