Android's first proper AI butler, not just a chatbot
An on-device agent that actually taps, swipes, and schedules on your phone instead of just talking about it.

What it does
OpenOmniBot is an Android AI agent built in Kotlin and Flutter that runs entirely on-device. It sees your screen through vision models, performs gestures, manipulates apps, manages calendars and alarms, and can even run a local Alpine environment or terminal. The loop is explicit: understand → decide → execute → reflect.
The interesting bit
Most “AI assistants” stop at text. This one treats your phone as a physical environment to operate. It also supports a remote Codex bridge, so you can pair it with OpenAI’s CLI tool running on a laptop via LAN and QR-code pairing—an odd but practical workaround for serious coding tasks.
Key highlights
- Vision-driven UI automation: screenshots, accessibility service, and gesture execution
- Extensible skill system via git repo links (community collection at OpenMinis/MinisSkills)
- Local inference option with MNN or llama.cpp backends, or cloud model APIs
- Scheduled tasks with subagent delegation, plus short/long-term memory with embeddings
- Embedded Alpine environment and ReTerminal integration for proper Linux tooling on Android
Caveats
- Build process is involved: two separate editions (standard vs. omniinfer), nested git submodules for local inference, and Flutter/Android toolchain fragility (the README literally includes a
flutter cleantroubleshooting step) - Memory embedding requires a separate embedding model; multimodal models are strongly recommended for core scenarios, which implies significant setup and likely API costs
- 1,608 stars suggests early traction but not yet battle-tested at scale
Verdict
Worth a look if you want an autonomous Android agent that actually does things rather than just suggesting them. Skip it if you need a polished consumer app today, or if your threat model can’t tolerate broad accessibility-service permissions.