Is Bonsai-demo open source?

Yes — PrismML-Eng/Bonsai-demo is open source, released under the Apache-2.0 license.

What language is Bonsai-demo written in?

PrismML-Eng/Bonsai-demo is primarily written in Shell.

How popular is Bonsai-demo?

PrismML-Eng/Bonsai-demo has 1.9k stars on GitHub and is currently accelerating.

Where can I find Bonsai-demo?

PrismML-Eng/Bonsai-demo is on GitHub at https://github.com/PrismML-Eng/Bonsai-demo.

← all repositories

PrismML-Eng/Bonsai-demo

LLMs compressed to 1 bit, somehow still coherent

A demo repo for running extreme-quantized language models locally without needing a research cluster.

★1.9k stars Shell Inference · Serving Language Models

View on GitHub ↗ Homepage ↗

Velocity · 7d

+154

★ / day

Trend

↗accelerating

star history

What it does

Bonsai-demo is a shell-scripted launcher for PrismML’s Bonsai model family: 1-bit and 1.58-bit (ternary) LLMs in 1.7B, 4B, and 8B sizes. It fetches pre-built llama.cpp binaries or builds MLX from source, grabs the quantized weights from HuggingFace, and runs inference on Mac Metal, Linux/Windows CUDA/Vulkan/ROCm, or plain CPU. The setup.sh script handles dependency installation, environment setup, and model download in one go.

The interesting bit

The 1-bit weights are already upstream in llama.cpp (CPU, Metal, CUDA, Vulkan, optimized x86). The 1.58-bit ternary models use a Q2_0 format — 2-bit alignment for hardware-friendly kernels, trading some size for speed. The repo tracks upstream merge status in public tables, which is unusually transparent for a research-to-production handoff.

Key highlights

Runs 8B models at ~2.5 GB for 8K context, ~10.5 GB for full 65K context
Supports both llama.cpp and MLX backends on Apple Silicon
Pre-built binaries available; source build optional
Community benchmark submissions accepted in community-benchmarks/
Whitepapers and HuggingFace collections linked for both model families

Caveats

Ternary model kernels (Metal, CUDA, CPU) are currently only in the PrismML llama.cpp fork; upstream PRs pending
ROCm, Vulkan, and optimized x86 support for ternary models are TBD
MLX ternary support requires stock MLX 2-bit, not the fork
Downloading models requires a HuggingFace token while repos remain private

Verdict

Worth a look if you’re experimenting with extreme quantization or need tiny models for edge deployment. Skip it if you want mature, fully-upstream ternary support or aren’t comfortable running shell scripts that build C++ projects from source.

Frequently asked

What is PrismML-Eng/Bonsai-demo?: A demo repo for running extreme-quantized language models locally without needing a research cluster.
Is Bonsai-demo open source?: Yes — PrismML-Eng/Bonsai-demo is open source, released under the Apache-2.0 license.
What language is Bonsai-demo written in?: PrismML-Eng/Bonsai-demo is primarily written in Shell.
How popular is Bonsai-demo?: PrismML-Eng/Bonsai-demo has 1.9k stars on GitHub and is currently accelerating.
Where can I find Bonsai-demo?: PrismML-Eng/Bonsai-demo is on GitHub at https://github.com/PrismML-Eng/Bonsai-demo.