Your phone's new offline AI engine, no cloud required
A cross-platform SDK that squeezes LLMs, speech, and vision models onto consumer hardware without phoning home.

What it does
RunAnywhere wraps llama.cpp and friends into SDKs for iOS, Android, Web, React Native, and Flutter. You download a quantized model, load it locally, and chat, transcribe, synthesize speech, or generate images without network access. The API surface is deliberately boring — initialize(), downloadModel(), chat() — which is exactly the point.
The interesting bit
The “Playground” demos are where it gets weird in a good way. There’s an Android accessibility agent that reads your screen and taps buttons using a 4B parameter model, a Chrome extension that plans and navigates web pages via WebGPU, and a Linux voice assistant that chains wake-word → VAD → Whisper → LLM → Piper TTS in a single C++ binary. These aren’t toy examples; one includes benchmark numbers on a Galaxy S24.
Key highlights
- Stable SDKs for Swift and Kotlin; Web, React Native, and Flutter are beta
- Supports LLMs (Llama, Mistral, Qwen, SmolLM), Whisper STT, neural TTS, diffusion image generation, and vision-language models
- Voice assistant pipeline wires STT → LLM → TTS end-to-end
- Model download with progress callbacks built in
- Apache 2.0 licensed
Caveats
- Vision Language Models and tool calling are iOS/Android/Web only; Flutter and React Native don’t get them yet
- Structured JSON output is marked “coming soon” for React Native and Flutter
- The Web SDK uses a slightly different API shape (
TextGeneration.loadModelvsRunAnywhere.loadModel) — minor but noticeable
Verdict
Mobile and edge developers who need privacy-first AI without managing llama.cpp builds themselves should look here. If you’re already comfortable compiling custom inference engines per platform, this is polished glue you might not need.