bitsandbytes-foundation/bitsandbytes
PyTorch library for k-bit quantization of large language models, providing 8-bit optimizers, LLM.int8() inference, and QLoRA 4-bit training.

Bitsandbytes provides memory-efficient quantization techniques for large language models in PyTorch. It uses 8-bit block-wise optimizers to reduce memory during training while maintaining 32-bit performance. For inference, the LLM.int8() method uses vector-wise quantization to cut memory requirements in half without performance loss. QLoRA enables 4-bit quantized model training by combining low-rank adaptation with aggressive quantization, dramatically reducing VRAM needs.