← all repositories

intel/xFasterTransformer

An optimized inference solution for running large language models on Intel X86/Xeon platforms.

xFasterTransformer
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

xFasterTransformer provides a GPU FasterTransformer-equivalent solution for CPU-based LLM inference on Intel Xeon processors. It supports distributed inference across multiple sockets and nodes for running larger models, and offers both C++ and Python APIs spanning from high-level to low-level interfaces. The project supports popular model architectures including Qwen, DeepSeek-R1, ChatGLM, and LLaMA, and includes integration with vLLM for OpenAI-compatible serving.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.