A voice assistant that actually stays off the internet
RCLI runs STT, LLM, TTS, and vision models entirely on Apple Silicon — no API keys, no cloud, no privacy policy to read.

What it does RCLI is a macOS terminal app that listens to your voice, understands commands, and talks back — all without leaving your machine. It bundles speech-to-text, a small LLM, text-to-speech, and vision-language models into a single pipeline. You can ask it to open Safari, play jazz on Spotify, summarize a PDF, or describe what’s on your screen. The interactive TUI uses push-to-talk (Space), with hotkeys for camera (V), screen capture (S), and model swapping (M).
The interesting bit The project ships two inference engines: the open-source llama.cpp for older Macs, and MetalRT, a proprietary Apple-Silicon-only GPU engine that claims up to 550 tok/s decode and sub-200ms end-to-end voice latency. That’s the hook — a company betting that on-device speed can beat round-tripping to the cloud, at least for small models.
Key highlights
- Full voice pipeline: Silero VAD → Zipformer/Whisper STT → Qwen3/LFM2 LLM → Kokoro/Piper TTS, with sentence-level double-buffering so the next reply renders while the current one plays.
- 40 macOS actions via tool calling (open apps, control media, run Shortcuts, send messages, toggle dark mode, etc.), executed through AppleScript and shell commands.
- Local RAG with hybrid vector + BM25 retrieval over PDF/DOCX/text, ingested via
rcli rag ingest. - On-demand VLM for image, camera, and screen analysis (currently llama.cpp engine only; Qwen3 VL 2B, LFM2 VL 1.6B, SmolVLM 500M).
- ~1GB default download; 20+ models manageable through the TUI.
Caveats
- MetalRT requires M3 or later; M1/M2 Macs fall back to llama.cpp automatically, with no stated timeline for MetalRT support on older chips.
- VLM support is explicitly noted as “coming soon” for MetalRT — currently only works on the llama.cpp engine.
- Tool calling accuracy degrades with accumulated context on small LLMs; the README suggests pressing X to reset conversation history when it gets flaky.
- MetalRT itself is proprietary and separately licensed; the open-source RCLI wrapper is MIT.
Verdict Worth a look if you want a Siri replacement that doesn’t phone home, and you’re comfortable with small-model tradeoffs. Skip it if you’re on Intel Macs, need cloud-scale model reasoning, or expect vision features to run on the faster engine today.