Is EAGLE open source?

Yes — SafeAILab/EAGLE is an open-source project tracked on heatdrop.

What language is EAGLE written in?

SafeAILab/EAGLE is primarily written in Python.

How popular is EAGLE?

SafeAILab/EAGLE has 2.5k stars on GitHub.

Where can I find EAGLE?

SafeAILab/EAGLE is on GitHub at https://github.com/SafeAILab/EAGLE.

← all repositories

SafeAILab/EAGLE

Speculative decoding that reads the model's mind in hidden layers

EAGLE speeds up LLM inference by extrapolating hidden-layer feature vectors rather than guessing tokens, promising provably lossless output distributions.

★2.5k stars Python Inference · Serving Language Models LLMOps · Eval

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does

EAGLE is a family of speculative decoding methods—EAGLE-1, EAGLE-2, and EAGLE-3—that accelerate autoregressive LLM generation by training a small draft model to predict future hidden-state features instead of raw tokens. The draft is verified by the target model in the usual speculative-decoding loop, and the authors prove that the accepted outputs maintain the same distribution as vanilla decoding. It is designed to plug into existing serving stacks without replacing them.

The interesting bit

Rather than drafting tokens from the vocabulary, EAGLE-1 extrapolates the second-top-layer contextual feature vectors; EAGLE-3 goes further by fusing low-, mid-, and high-level semantic features during training, removing the top-layer constraint. This shifts the drafting problem from discrete token prediction to continuous representation space, which appears to be why EAGLE-3 claims generation 5.6 times faster than vanilla decoding on Vicuna-13B while prior token-based speculative methods lag behind.

Key highlights

Claims up to 5.6 times faster generation than vanilla decoding on 13B models (EAGLE-3), and 4 times for EAGLE-2, per the project’s own benchmarks.
Certified by a third-party Spec-Bench evaluation as the fastest speculative method so far (EAGLE-1 era).
Adopted by major serving frameworks including vLLM, TensorRT-LLM, SGLang, AMD ROCm, and AWS NeuronX.
Training is pitched as accessible: the authors say it takes 1–2 days on eight RTX 3090 GPUs.
Composable with quantization, FlashAttention, DeepSpeed, and Mamba.

Caveats

The maintainers explicitly warn that only official EAGLE-3 checkpoints are recognized; unofficial community weights (including many Qwen3 and LLaMA-4 variants) may vary in performance.
The default branch implements EAGLE-3/EAGLE-2; anyone needing the original EAGLE-1 must switch to the v1 branch.
Precision-sensitive: the README notes that Qwen2 targets require bf16 rather than fp16 to avoid numerical overflow.

Verdict

Worth a look if you run LLM serving infrastructure and want a speculative decoder with wide framework adoption. Less useful if you need a drop-in, weight-agnostic speedup without training or verifying checkpoint provenance.

Frequently asked

What is SafeAILab/EAGLE?: EAGLE speeds up LLM inference by extrapolating hidden-layer feature vectors rather than guessing tokens, promising provably lossless output distributions.
Is EAGLE open source?: Yes — SafeAILab/EAGLE is an open-source project tracked on heatdrop.
What language is EAGLE written in?: SafeAILab/EAGLE is primarily written in Python.
How popular is EAGLE?: SafeAILab/EAGLE has 2.5k stars on GitHub.
Where can I find EAGLE?: SafeAILab/EAGLE is on GitHub at https://github.com/SafeAILab/EAGLE.