← all repositories

Tencent-Hunyuan/SRPO

SRPO is a reinforcement learning method for aligning diffusion models with fine-grained human preference during training.

SRPO
Velocity · 7d
+4.7
★ / day
Trend
steady
star history

SRPO introduces a sampling strategy for diffusion fine-tuning that improves optimization stability and computational efficiency when aligning the full diffusion trajectory with human preference signals. It applies a novel direct alignment approach to restore highly noisy images during training, targeting improved image generation quality. The project provides code and trained models for the Flux diffusion model.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.