Yes — Tencent-Hunyuan/SRPO is an open-source project tracked on heatdrop.

What language is SRPO written in?

Tencent-Hunyuan/SRPO is primarily written in Python.

Tencent-Hunyuan/SRPO has 1.3k stars on GitHub.

Where can I find SRPO?

Tencent-Hunyuan/SRPO is on GitHub at https://github.com/Tencent-Hunyuan/SRPO.

Tencent-Hunyuan/SRPO

Ten-minute FLUX tuning that skips the reward-hacking phase

SRPO fine-tunes diffusion models directly on human preference signals using analytical gradients, replacing rollout-heavy RL with sub-ten-minute updates that resist reward hacking.

★1.3k stars Python Image · Video · Audio LLMOps · Eval

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does

SRPO is a training method for aligning text-to-image diffusion models—specifically FLUX.1.dev and Qwen-Image—with fine-grained human preferences. Instead of relying on traditional rollout-based reinforcement learning, it computes analytical gradients from a single generated image and backpropagates reward signals directly through the full diffusion trajectory, including highly noisy early timesteps. The result is a fine-tuned checkpoint that improves perceptual quality without the usual RL infrastructure.

The interesting bit

The method sidesteps the color-oversaturation and layout corruption that plague typical reward-hacking scenarios by directly regularizing with negative rewards, eliminating the need for KL-divergence penalties or separate reward systems. It also supports dynamically controllable text conditions during online RL, allowing on-the-fly style adjustments within the reward model’s scope—a combination the authors claim is a first for this category.

Key highlights

Trains FLUX.1.dev to measurable improvement in under 10 minutes using a single-image rollout and analytical gradients.
Can switch to offline mode with fewer than 1,500 real images, removing the need for live generation during training entirely.
Hit #1 on the Artificial Analysis open-source text-to-image leaderboard as of October 2025.
Avoids reward hacking (such as oversaturation) by design, using negative reward regularization instead of KL constraints.
Supports both FLUX.1-dev and Qwen-Image, with a ComfyUI workflow and inference code included.

Caveats

PickScore support is currently suboptimal because the control words are tuned for HPS-v2.1, so switching reward models may yield worse results.
Direct backpropagation through the diffusion trajectory demands significant GPU memory; the authors recommend enabling VAE gradient checkpointing to avoid running out of VRAM.
The supported model list is still narrow: FLUX.1-dev and Qwen-Image are ready, but extensions to Qwen-Image-Edit and FLUX 2 remain on the roadmap.

Verdict

Researchers and practitioners who need to fine-tune large diffusion models for perceptual quality without maintaining a complex RL infrastructure should look here. If you are expecting a plug-and-play tool for arbitrary base models or consumer GPUs, this is still very much a research codebase.

Frequently asked

What is Tencent-Hunyuan/SRPO?: SRPO fine-tunes diffusion models directly on human preference signals using analytical gradients, replacing rollout-heavy RL with sub-ten-minute updates that resist reward hacking.
Is SRPO open source?: Yes — Tencent-Hunyuan/SRPO is an open-source project tracked on heatdrop.
What language is SRPO written in?: Tencent-Hunyuan/SRPO is primarily written in Python.
How popular is SRPO?: Tencent-Hunyuan/SRPO has 1.3k stars on GitHub.
Where can I find SRPO?: Tencent-Hunyuan/SRPO is on GitHub at https://github.com/Tencent-Hunyuan/SRPO.