← all repositories

jjang-ai/vmlx

Self-hosted inference server for LLMs, VLMs, and image generation models running on Apple Silicon hardware using the MLX framework.

vmlx
Velocity · 7d
+5.7
★ / day
Trend
steady
star history

vMLX is an inference server designed specifically for Apple Silicon (M1-M4 chips) running MLX-optimized language models. It provides OpenAI, Anthropic, and Ollama compatible HTTP APIs, enabling self-hosted LLM deployment without third-party API keys. The server implements advanced optimizations including L2 disk-based KV cache persistence, L1 paged memory management for fast time-to-first-token, hybrid SSM scheduling, and continuous batching to maximize throughput on Apple hardware.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.