Yes — hybridgroup/yzma is an open-source project tracked on heatdrop.

What language is yzma written in?

hybridgroup/yzma is primarily written in Go.

hybridgroup/yzma has 525 stars on GitHub.

Where can I find yzma?

hybridgroup/yzma is on GitHub at https://github.com/hybridgroup/yzma.

hybridgroup/yzma

Run local LLMs from Go without CGo or external servers

It lets Go applications call llama.cpp directly for local, hardware-accelerated inference without CGo or a sidecar server.

★525 stars Go Inference · Serving Language Models

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does

yzma is a Go library that loads llama.cpp shared libraries and exposes their API for local text and vision-language model inference. It handles model loading, tokenization, decoding, and sampling, letting a standard Go binary perform inference using whatever GPU or CPU acceleration the host provides. The library also includes a small CLI helper for downloading compatible GGUF models and prebuilt llama.cpp binaries.

The interesting bit

The project uses purego and ffi to call into llama.cpp at runtime, which means you can build and cross-compile with ordinary go build and no C toolchain. That also keeps the model inside the same process, so inference avoids the latency and complexity of talking to an external server.

Key highlights

No CGo required: builds with standard Go tooling and cross-compiles using only GOOS/GOARCH.
Hardware acceleration is supported on Linux, macOS, and Windows via CUDA, Metal, Vulkan, ROCm, and others.
Supports vision-language and multimodal models, not just text LLMs.
Claims coverage of over 96% of llama.cpp functionality.
Includes benchmark data for specific hardware (e.g., a Qwen3-VL-2B VLM running on an Apple M4 Max).

Caveats

Tightly coupled to llama.cpp versions: breaking upstream changes require updating yzma, and the README publishes a compatibility table.
Platform support is uneven; macOS builds are arm64-only, Windows is amd64-only, and Linux covers both.
A small fraction of llama.cpp features remain unimplemented.

Verdict

Worth a look if you want to ship a single-binary Go application that runs models locally without containerizing a Python stack or managing a separate inference server. Skip it if you need guaranteed API stability against arbitrary future llama.cpp releases or if your target platform falls outside the supported OS/arch matrix.

Frequently asked

What is hybridgroup/yzma?: It lets Go applications call llama.cpp directly for local, hardware-accelerated inference without CGo or a sidecar server.
Is yzma open source?: Yes — hybridgroup/yzma is an open-source project tracked on heatdrop.
What language is yzma written in?: hybridgroup/yzma is primarily written in Go.
How popular is yzma?: hybridgroup/yzma has 525 stars on GitHub.
Where can I find yzma?: hybridgroup/yzma is on GitHub at https://github.com/hybridgroup/yzma.