← all repositories

intel/intel-extension-for-transformers

Intel's official toolkit for accelerating and optimizing LLM inference with compression techniques and chatbot building on Intel hardware.

intel-extension-for-transformers
Velocity · 7d
+1.7
★ / day
Trend
steady
star history

This is Intel’s official extension for transformers that provides state-of-the-art compression techniques (INT4, weight-only quantization) to run LLMs efficiently on Intel platforms including CPU, GPU (Gaudi3/Habana), and Xeon processors. It offers NeuralChat for building chatbots within minutes, supports RAG-based retrieval applications, and includes optimizations like speculative decoding and StreamingLLM. The toolkit supports popular models such as Qwen2, Llama 3, and GPT-J, targeting performance improvements for LLM inference workloads.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.