hao-ai-lab/Consistency_LLM
Consistency Large Language Models (CLLMs) use Jacobi decoding to generate multiple tokens in parallel, dramatically reducing LLM inference latency.

Velocity · 7d
+0.4
★ / day
Trend
→steady
star history
CLLMs are trained to map any randomly initialized n-token sequence to the same result as auto-regressive decoding in as few steps as possible. This enables efficient parallel decoding rather than the sequential token-by-token generation used in standard LLMs. The models demonstrate substantial generation speedups across various tasks while maintaining output quality.