intel/ipex-llm
Intel LLM acceleration library for GPU, NPU, and CPU enabling optimized inference and finetuning of 70+ LLMs.

IPEX-LLM provides hardware-accelerated LLM inference and finetuning on Intel XPU hardware including integrated GPUs, discrete GPUs (Arc, Flex, Max), and NPUs. It offers low-bit quantization support (FP8/FP6/FP4/INT4) and state-of-the-art LLM optimizations. The library integrates seamlessly with popular ecosystem tools including llama.cpp, Ollama, vLLM, HuggingFace transformers, LangChain, LlamaIndex, DeepSpeed, and Axolotl, supporting over 70 verified models such as LLaMA, Mistral, DeepSeek, Qwen, and Phi.