A GUI agent that runs on your desk, not in the cloud
Mano-P is a 4B vision-language-action model that controls computers by sight alone, running entirely on Apple Silicon with no data leaving the device.

What it does
Mano-P is a GUI-VLA agent: it looks at your screen, decides what to click or type, and executes. It handles multi-step tasks—hundreds of interactions across web apps, desktop software, even games—using only visual input, no APIs. The project ships as open-source skills, local inference models, and a companion SDK called Cider for quantization.
The interesting bit
The privacy angle is genuine, not marketing. The 4B model runs on an M4 Mac mini or MacBook with 32GB RAM, or on a USB-C compute stick. Screenshots and task data never leave the machine. The Cider SDK adds INT8 activation quantization primitives that MLX lacks, giving 1.4–2.2× prefill speedups over standard MLX configs—and it works with any MLX model, not just Mano-P.
Key highlights
- 58.2% on OSWorld benchmark, ranking first among specialized GUI agent models (per README claims; opencua-72b at 45.0%)
- 41.7 NavEval on WebRetriever Protocol I, ahead of Gemini 2.5 Pro Computer Use (40.9) and Claude 4.5 Computer Use (31.3)
- ~80 tokens/s decode on Apple M5 Pro; W8A8 quantization yields ~12.7% prefill speedup over W8A16 baseline
- Mano-AFK application: full PRD → code → deploy → test → fix loop using Mano-P for real-browser E2E testing, fully autonomous
- Three-phase open-source rollout: skills first, then local models/SDK, then training methods and pruning/quantization techniques
Caveats
- Hardware floor is steep: M4 chip plus 32GB RAM, or a compute stick via USB 4.0
- Deployment instructions for both methods are listed as “releasing in the near future”—not yet available
- The project is partially open-sourced; training methodologies and some model components arrive in later phases
Verdict
Worth a look if you need computer-use automation in air-gapped or privacy-sensitive environments, and you already own recent Apple Silicon. Skip it if you’re on Windows/Linux hardware or need something production-ready today—the phased release means you’ll wait for pieces.