NVlabs/Fast-dLLM
Training-free inference acceleration for diffusion-based Large Language Models using KV cache and parallel decoding.

Velocity · 7d
+2.7
★ / day
Trend
→steady
star history
Fast-dLLM provides a family of techniques to accelerate diffusion-based LLMs and VLMs during inference. It implements training-free acceleration via KV cache reuse and parallel decoding strategies. The approach applies to text-only dLLMs, vision-language models (dVLMs), and vision-language-action models for autonomous driving, enabling faster generation without model fine-tuning.