Gen-Verse/dLLM-RL
TraceRL is a reinforcement learning framework for training and post-training discrete diffusion large language models.

The repository provides an official implementation for training diffusion-based LLMs using reinforcement learning techniques. It supports a wide range of discrete diffusion language models including TraDo, SDAR, Dream, LLaDA, MMaDA, LLaDA-V, and Diffu-Coder. The framework enables post-training via SFT, RL with optional value models and process rewards, and RLHF across diverse settings for mathematical reasoning, code generation, and multimodal tasks.