← all repositories
Mininglamp-AI/Mano-P

A GUI agent that runs on your desk, not in the cloud

Mano-P is a 4B vision-language-action model that controls computers by sight alone, running entirely on Apple Silicon with no data leaving the device.

Mano-P
Velocity · 7d
+31
★ / day
Trend
steady
star history

What it does

Mano-P is a GUI-VLA agent: it looks at your screen, decides what to click or type, and executes. It handles multi-step tasks—hundreds of interactions across web apps, desktop software, even games—using only visual input, no APIs. The project ships as open-source skills, local inference models, and a companion SDK called Cider for quantization.

The interesting bit

The privacy angle is genuine, not marketing. The 4B model runs on an M4 Mac mini or MacBook with 32GB RAM, or on a USB-C compute stick. Screenshots and task data never leave the machine. The Cider SDK adds INT8 activation quantization primitives that MLX lacks, giving 1.4–2.2× prefill speedups over standard MLX configs—and it works with any MLX model, not just Mano-P.

Key highlights

  • 58.2% on OSWorld benchmark, ranking first among specialized GUI agent models (per README claims; opencua-72b at 45.0%)
  • 41.7 NavEval on WebRetriever Protocol I, ahead of Gemini 2.5 Pro Computer Use (40.9) and Claude 4.5 Computer Use (31.3)
  • ~80 tokens/s decode on Apple M5 Pro; W8A8 quantization yields ~12.7% prefill speedup over W8A16 baseline
  • Mano-AFK application: full PRD → code → deploy → test → fix loop using Mano-P for real-browser E2E testing, fully autonomous
  • Three-phase open-source rollout: skills first, then local models/SDK, then training methods and pruning/quantization techniques

Caveats

  • Hardware floor is steep: M4 chip plus 32GB RAM, or a compute stick via USB 4.0
  • Deployment instructions for both methods are listed as “releasing in the near future”—not yet available
  • The project is partially open-sourced; training methodologies and some model components arrive in later phases

Verdict

Worth a look if you need computer-use automation in air-gapped or privacy-sensitive environments, and you already own recent Apple Silicon. Skip it if you’re on Windows/Linux hardware or need something production-ready today—the phased release means you’ll wait for pieces.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.