← all repositories

hao-ai-lab/LookaheadDecoding

Lookahead Decoding is an ICML 2024 paper introducing a parallel decoding algorithm that accelerates LLM inference without requiring draft models or data stores.

LookaheadDecoding
Velocity · 7d
+1.4
★ / day
Trend
steady
star history

The project implements a parallel decoding algorithm for language model inference that breaks sequential dependency by using lookahead verification. Based on Jacobi iteration methods, it decodes multiple tokens simultaneously by predicting and verifying n-grams in parallel. The approach achieves wall-clock speedups on LLaMA-2-Chat 7B generation and supports integration with FlashAttention and sampling techniques.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.