lucienhuangfu/eLLM
A Rust-based LLM inference framework that runs large language models on Intel Xeon CPUs with AMX acceleration.

Velocity · 7d
+1.2
★ / day
Trend
→steady
star history
eLLM is an inference runtime built specifically for CPU servers (Intel Xeon/EPYC) that eliminates the need for GPUs or NPUs. It achieves low latency by computing attention head by head and runs full Prefill end to end. The framework is designed to be vLLM API compatible, allowing integration with existing ecosystems while targeting GPU-consistent numerical results.