PrismML-Eng/Bonsai-demo
Demo repository for running Bonsai 1-bit and Ternary-Bonsai 1.58-bit language models locally across CPU, Metal, CUDA, Vulkan, and ROCm backends.

Velocity · 7d
+11
★ / day
Trend
→steady
star history
This repository provides instructions and scripts to run Bonsai quantized language models locally on Mac (Metal), Linux/Windows (CUDA, Vulkan, ROCm), or CPU. It includes pre-built llama.cpp binaries and MLX (Apple Silicon) forks with support for Q1_0 and Q1_58 quantization formats. Model weights are distributed via HuggingFace collections, with companion web demos and Google Colab notebooks for easy experimentation.