kuleshov-group/bd3lms
A family of block discrete denoising diffusion language models that interpolate between autoregressive and diffusion approaches for text generation.

BD3-LMs decompose token sequences into blocks and perform discrete diffusion within each block to generate text. By tuning block size, the model trades off between quality and sample efficiency. The work proposes an efficient training algorithm, gradient variance estimators, and data-driven noise schedules to minimize variance. The models achieve state-of-the-art likelihoods among diffusion models while enabling generation of arbitrary-length sequences.