← all repositories
ggml-org/ggml

A C++ tensor library that refuses to malloc at runtime

ggml is the low-level engine behind llama.cpp, built for inference where memory predictability matters more than framework ergonomics.

ggml
Velocity · 7d
+11
★ / day
Trend
steady
star history

What it does ggml is a C/C++ tensor library for machine learning inference. It handles the usual suspects—matrix ops, quantization, automatic differentiation, a couple of optimizers (ADAM, L-BFGS)—but wraps them in a cross-platform implementation with zero third-party dependencies.

The interesting bit The zero runtime memory allocations claim is the standout. In a world where PyTorch and TensorFlow happily grab GPU memory behind your back, ggml pre-allocates and manages its own arena. That makes it a natural fit for the “run a 7B model on your laptop” crowd, which is exactly where much of its real-world use happens (via llama.cpp and whisper.cpp). The README notes that active development currently bleeds into those downstream repos, so this core library can feel a bit like the quiet engine room.

Key highlights

  • Integer quantization support (the GGUF format it spawned is now a de facto standard for quantized LLMs)
  • Automatic differentiation and two built-in optimizers
  • Broad hardware support, though specifics are left to the build system and examples
  • No dependencies beyond a C++ toolchain and CMake
  • Ships with working GPT-2 inference example (117M parameter model)

Caveats

  • The README is sparse; much of the ecosystem documentation lives in llama.cpp discussions and external Hugging Face blog posts
  • “Broad hardware support” is claimed but not enumerated—expect to dig into the build scripts for your target platform

Verdict Worth a look if you’re building custom inference pipelines, embedding ML into resource-constrained environments, or just want to understand how llama.cpp actually works under the hood. Skip it if you need a batteries-included framework with Python ergonomics and extensive tutorials.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.