Your phone, but it can run LLMs in airplane mode
A React Native app that downloads and runs small language models locally on iOS and Android, no cloud required.

What it does PocketPal AI is a mobile app that downloads GGUF-format small language models directly to your phone and runs them via llama.cpp bindings. You browse or search Hugging Face from inside the app, pick a quantization level that fits your device’s memory, download, load, and chat. Everything stays on-device; the only outbound data is explicitly opt-in benchmark results or feedback you choose to send.
The interesting bit The “Pals” feature lets you create multiple persistent personas—each with its own model, system prompt, and settings—and switch between them mid-chat. There’s even a meta-layer where one AI can write the system prompt for another Pal. It’s a surprisingly thoughtful UX for a locally-run model wrapper.
Key highlights
- Browse, download, and run GGUF models from Hugging Face Hub, including gated models via personal access token
- Auto-offload models from RAM when the app backgrounds; reload on return
- Real-time tokens-per-second and memory metrics during inference
- Message editing with regeneration, retry with a different model, and long-press paragraph copying
- iPad support, background downloads on iOS, and screen-stay-awake during generation
- Localization started: Japanese and Chinese as of v1.8.16
Caveats
- Copying text currently loses formatting; the README notes this is a known limitation
- Localization is early; only two languages so far
- The roadmap is heavy on “UI/UX enhancements” and “improved documentation,” suggesting the project knows its polish isn’t finished
Verdict Worth a look if you want private, offline LLM access on a phone without managing Termux or Python environments yourself. Skip it if you need production-grade reliability or are allergic to React Native’s build toolchain.