ggml-org/LlamaBarn

A macOS menu bar app that runs local LLMs via an OpenAI-compatible API server.

★1.3k stars Swift Inference · Serving Coding Assistants

View on GitHub ↗

Velocity · 7d

+3.8

★ / day

Trend

→steady

star history

LlamaBarn is a native macOS application that serves as a local inference server for LLMs. It provides a built-in model catalog for easy installation, automatically loads models on demand and unloads them when idle, and exposes a standard OpenAI-compatible API endpoint. The app runs entirely on-device with no cloud dependency, and integrates with chat UIs, code editors, and CLI tools like Claude Code, Cline, and Continue.