Is dLLM-RL open source?

Yes — Gen-Verse/dLLM-RL is open source, released under the Apache-2.0 license.

What language is dLLM-RL written in?

Gen-Verse/dLLM-RL is primarily written in Python.

How popular is dLLM-RL?

Gen-Verse/dLLM-RL has 512 stars on GitHub.

Where can I find dLLM-RL?

Gen-Verse/dLLM-RL is on GitHub at https://github.com/Gen-Verse/dLLM-RL.

← all repositories

Gen-Verse/dLLM-RL

Diffusion LLMs get a full-stack post-training gym

TraceRL provides a unified RL and SFT stack for discrete diffusion language models, aiming to close the reasoning gap with autoregressive transformers.

★512 stars Python ML Frameworks Language Models Agents

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does dLLM-RL is a training and inference framework for discrete diffusion language models—the kind that generate text by iteratively unmasking tokens rather than predicting left-to-right. It bundles supervised fine-tuning, trajectory-aware RL (TraceRL), coupled RL, random masking RL, and RLHF under one roof, with support for math, coding, and multimodal tasks. The authors use it to train the TraDo model family, which they bill as the first long-CoT diffusion language models.

The interesting bit TraceRL optimizes the entire denoising trajectory instead of just the final output, and pairs it with a diffusion-specific value model to cut variance and improve stability during optimization. The TraDo models (4B and 8B) are offered as evidence that this approach can challenge autoregressive baselines on standard reasoning benchmarks.

Key highlights

Supports nearly all open-source discrete diffusion LMs, including LLaDA, Dream, SDAR, Diffu-Coder, and MMaDA.
TraceRL offers an optional diffusion-based value model for variance reduction and training stability.
Ships with inference accelerations such as KV-cache optimizations and block-attention sampling.
Includes the first long-CoT diffusion model, TraDo-8B-Thinking, trained with a mix of TraceRL and long-CoT SFT.
Multi-node training and evaluation are built in, not bolted on.

Verdict Researchers already experimenting with diffusion language models should treat this as the most complete post-training toolkit available. If you are committed to the autoregressive stack and have no plans to unmask tokens, there is nothing here for you.

Frequently asked

What is Gen-Verse/dLLM-RL?: TraceRL provides a unified RL and SFT stack for discrete diffusion language models, aiming to close the reasoning gap with autoregressive transformers.
Is dLLM-RL open source?: Yes — Gen-Verse/dLLM-RL is open source, released under the Apache-2.0 license.
What language is dLLM-RL written in?: Gen-Verse/dLLM-RL is primarily written in Python.
How popular is dLLM-RL?: Gen-Verse/dLLM-RL has 512 stars on GitHub.
Where can I find dLLM-RL?: Gen-Verse/dLLM-RL is on GitHub at https://github.com/Gen-Verse/dLLM-RL.