Yes — jjang-ai/vmlx is open source, released under the Apache-2.0 license.

What language is vmlx written in?

jjang-ai/vmlx is primarily written in Python.

jjang-ai/vmlx has 777 stars on GitHub.

Where can I find vmlx?

jjang-ai/vmlx is on GitHub at https://github.com/jjang-ai/vmlx.

jjang-ai/vmlx

Apple Silicon inference server with a five-layer cache obsession

vMLX gives Apple Silicon a self-hosted OpenAI-compatible inference stack with a near-obsessive multi-tier caching strategy.

★777 stars Python Inference · Serving Language Models Agents

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does

vMLX is a self-hosted inference server that runs LLMs, vision-language models, image generators, and audio models on Apple Silicon using Apple’s MLX framework. It exposes OpenAI-, Anthropic-, and Ollama-compatible HTTP endpoints so existing clients and SDKs drop in without changes. A companion desktop app, MLX Studio, wraps the engine in a graphical interface for chat, model management, and image editing.

The interesting bit

The project treats caching as a first-class architectural concern, stacking five layers from an L1 memory-aware prefix cache down to an L2 disk cache that survives restarts, with optional q4/q8 KV quantization squeezed in between. It also distributes pipeline parallelism across multiple Macs over Thunderbolt, Ethernet, or WiFi, and pushes a custom JANG 2-bit quantization that the authors claim outperforms MLX’s own 4-bit weights on at least one large model benchmark.

Key highlights

Serves text, vision, multimodal (including Nemotron-3-Nano-Omni), MoE, and hybrid SSM models through a single HTTP API.
Five-layer cache architecture: L1 paged/prefix memory cache, L2 persistent disk cache, quantized KV storage, and content-addressable deduplication.
Distributed inference across two or more Macs with auto-discovery via Bonjour, Tailscale, or manual IP; the coordinator is elected by capability score.
Native tool-calling and reasoning/thinking parsers auto-detected for major model families (Qwen, Llama, Mistral, DeepSeek, etc.).
Includes local image generation and editing via Flux models, plus Kokoro TTS and Whisper STT through mlx-audio.

Verdict

Developers who want a drop-in local replacement for OpenAI or Anthropic endpoints on Apple Silicon—and care about cache persistence across restarts—should look here. If you are not on macOS or Apple Silicon, this stack is irrelevant.

Frequently asked

What is jjang-ai/vmlx?: vMLX gives Apple Silicon a self-hosted OpenAI-compatible inference stack with a near-obsessive multi-tier caching strategy.
Is vmlx open source?: Yes — jjang-ai/vmlx is open source, released under the Apache-2.0 license.
What language is vmlx written in?: jjang-ai/vmlx is primarily written in Python.
How popular is vmlx?: jjang-ai/vmlx has 777 stars on GitHub.
Where can I find vmlx?: jjang-ai/vmlx is on GitHub at https://github.com/jjang-ai/vmlx.