← all repositories

lucienhuangfu/eLLM

A Rust-based LLM inference framework that runs large language models on Intel Xeon CPUs with AMX acceleration.

420 stars Rust Inference · Serving
eLLM
Velocity · 7d
+1.2
★ / day
Trend
steady
star history

eLLM is an inference runtime built specifically for CPU servers (Intel Xeon/EPYC) that eliminates the need for GPUs or NPUs. It achieves low latency by computing attention head by head and runs full Prefill end to end. The framework is designed to be vLLM API compatible, allowing integration with existing ecosystems while targeting GPU-consistent numerical results.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.