Is bitsandbytes open source?

Yes — bitsandbytes-foundation/bitsandbytes is open source, released under the MIT license.

What language is bitsandbytes written in?

bitsandbytes-foundation/bitsandbytes is primarily written in Python.

How popular is bitsandbytes?

bitsandbytes-foundation/bitsandbytes has 8.3k stars on GitHub.

Where can I find bitsandbytes?

bitsandbytes-foundation/bitsandbytes is on GitHub at https://github.com/bitsandbytes-foundation/bitsandbytes.

← all repositories

bitsandbytes-foundation/bitsandbytes

Shrink LLMs to 4-bit and train them anyway

It squeezes massive PyTorch language models into a fraction of their usual memory using 8-bit and 4-bit quantization, enabling inference and fine-tuning on consumer hardware.

★8.3k stars Python Inference · Serving Language Models ML Frameworks

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does

bitsandbytes is a PyTorch quantization library that provides drop-in layers like Linear8bitLt and Linear4bit to replace standard linear layers. It also ships 8-bit optimizers that use block-wise quantization to keep 32-bit training stability while using far less memory. The goal is straightforward: make large language model inference and training possible on hardware with limited VRAM.

The interesting bit

The library doesn’t just blindly chop precision. LLM.int8() uses vector-wise quantization for the bulk of the weights but routes numerical outliers through 16-bit matrix multiplication, which is how it claims to avoid performance degradation while halving memory. QLoRA pushes this further by freezing a 4-bit quantized base model and injecting small, trainable LoRA adapters, letting you fine-tune models you couldn’t otherwise fit on a single GPU.

Key highlights

8-bit inference via LLM.int8() with outlier-aware mixed-precision matmul.
4-bit training via QLoRA by quantizing the base model and tuning low-rank adapters.
8-bit optimizers that trade memory for compute without destabilizing training.
Broad hardware support: NVIDIA, AMD, Intel XPU/Gaudi, plus x86 and ARM CPUs on Linux, Windows, and macOS.
MIT licensed and integrated into Hugging Face Transformers, Diffusers, and PEFT.

Caveats

The support matrix is uneven. macOS Metal and some ARM CPU targets only get slow (🐢) implementations for core features, and 8-bit optimizers are missing entirely on Apple Silicon and Linux aarch64.
Intel Gaudi support is partial: QLoRA is only partially supported and 8-bit optimizers are unsupported.
The README notes that the development branch support table may differ from the latest stable release, so check your specific version.

Verdict

If you are fine-tuning or serving LLMs on consumer or mid-tier GPUs and constantly hitting OOM errors, this is essential infrastructure. If you have unlimited access to top-tier H100 clusters and don’t care about memory pressure, you probably don’t need to think about it.

Frequently asked

What is bitsandbytes-foundation/bitsandbytes?: It squeezes massive PyTorch language models into a fraction of their usual memory using 8-bit and 4-bit quantization, enabling inference and fine-tuning on consumer hardware.
Is bitsandbytes open source?: Yes — bitsandbytes-foundation/bitsandbytes is open source, released under the MIT license.
What language is bitsandbytes written in?: bitsandbytes-foundation/bitsandbytes is primarily written in Python.
How popular is bitsandbytes?: bitsandbytes-foundation/bitsandbytes has 8.3k stars on GitHub.
Where can I find bitsandbytes?: bitsandbytes-foundation/bitsandbytes is on GitHub at https://github.com/bitsandbytes-foundation/bitsandbytes.