Is Rapid-MLX open source?

Yes — raullenchai/Rapid-MLX is open source, released under the Apache-2.0 license.

What language is Rapid-MLX written in?

raullenchai/Rapid-MLX is primarily written in Python.

How popular is Rapid-MLX?

raullenchai/Rapid-MLX has 3.3k stars on GitHub and is currently accelerating.

Where can I find Rapid-MLX?

raullenchai/Rapid-MLX is on GitHub at https://github.com/raullenchai/Rapid-MLX.

← all repositories

raullenchai/Rapid-MLX

Apple Silicon LLM server wearing an OpenAI mask, outrunning Ollama

It turns your Mac into an OpenAI-compatible local LLM server that runs models up to 4× faster than Ollama, no API key required.

★3.3k stars Python Inference · Serving Coding Assistants

View on GitHub ↗ Homepage ↗

Velocity · 7d

+10

★ / day

Trend

↗accelerating

star history

What it does

Rapid-MLX is an inference engine and HTTP server for Apple Silicon Macs. It downloads and runs local LLMs—from small 4B parameter models up to 158B mixture-of-experts—using Apple’s MLX framework, then exposes them through an OpenAI-compatible API. Any tool that talks to ChatGPT, including Cursor, Claude Code, Aider, LangChain, and PydanticAI, can point at your localhost instead.

The interesting bit

The project treats compatibility as a first-class feature, not an afterthought. It ships 17 tool parsers to maximize which model-and-agent combinations actually work, and even defines its own “Model-Harness Index” that scores pairings on tool calling, coding ability, and knowledge retention. That obsession with integration testing—3200+ tests across agent harnesses—is unusual for a local inference tool.

Key highlights

Claims 4.2× faster inference than Ollama on Apple Silicon, with cached TTFT as low as 0.08 s
Supports vision models (Gemma 4, Qwen-VL) and audio (TTS/STT) via optional extras
Runs models from 4B to 158B parameters, including DeepSeek V4 Flash and Qwen3.5 122B
Drop-in replacement: change the base URL to localhost:8000/v1 and existing OpenAI clients just work
Prompt caching and reasoning separation; includes a REPL and an OpenAI-compatible HTTP server

Caveats

Vision and audio require separate extra installs (rapid-mlx[vision] and rapid-mlx[audio])
macOS ships Python 3.9, but the tool requires 3.10+, so first-time setup can involve a Python upgrade
The README notes that Brew 5.x sandboxing can block the Homebrew install and requires a manual pre-tap workaround

Verdict

If you live in Cursor or Claude Code and want local inference on a Mac without breaking your workflow, this is built for you. Windows and Linux users, or anyone already happy with Ollama’s speed, can keep scrolling.

Frequently asked

What is raullenchai/Rapid-MLX?: It turns your Mac into an OpenAI-compatible local LLM server that runs models up to 4× faster than Ollama, no API key required.
Is Rapid-MLX open source?: Yes — raullenchai/Rapid-MLX is open source, released under the Apache-2.0 license.
What language is Rapid-MLX written in?: raullenchai/Rapid-MLX is primarily written in Python.
How popular is Rapid-MLX?: raullenchai/Rapid-MLX has 3.3k stars on GitHub and is currently accelerating.
Where can I find Rapid-MLX?: raullenchai/Rapid-MLX is on GitHub at https://github.com/raullenchai/Rapid-MLX.