← all repositories
ZJU-REAL/ClawGUI

A full-stack factory for phone-tapping AI agents

ClawGUI unifies online RL training, standardized benchmarks, and real-device deployment for GUI agents in one modular framework.

ClawGUI
Velocity · 7d
+21
★ / day
Trend
steady
star history

What it does

ClawGUI is a research framework that handles the complete lifecycle of GUI agents: training them with online reinforcement learning, evaluating them against standardized benchmarks, and deploying them to control real Android, HarmonyOS, or iOS devices via natural language. It ships as five independent modules—RL, Eval, Agent, Skills, and an on-device App—each with its own environment and documentation.

The interesting bit

The framework replaces standard GRPO with GiGPO+PRM for fine-grained step-level rewards during training, and it actually runs the full “brain + agent” stack directly on a single phone via Shizuku—no desktop coordinator required. The training-free skill evolution system lets agents diagnose failures, revise structured skill packages, and reuse them across tasks without retraining.

Key highlights

  • ClawGUI-RL: Parallel Docker Android emulators or real-device training with automatic failover and episode visualization
  • ClawGUI-Eval: 6 benchmarks, 11+ models, 95.8% reproduction rate against official results for actually comparable numbers
  • ClawGUI-Agent: Cross-platform device control through 12+ chat platforms with one-command evaluation (“benchmark qwen3vl on screenspot-pro”)
  • ClawGUI-APP: Full phone-only deployment; brain LLM and phone agent run on-device, though the VLM still calls cloud APIs for now
  • ClawGUI-2B: End-to-end validation—a 2B model trained entirely with this pipeline hits 17.1 MobileWorld SR vs. 11.1 baseline

Caveats

  • Desktop and web online RL extensions are on the roadmap but not yet implemented
  • On-device inference still relies on cloud APIs for the brain/VLM; fully local inference is future work
  • Each module has independent environment setup—no unified install, so expect some assembly

Verdict

Researchers building or benchmarking GUI agents should grab this; it solves the “train in one repo, evaluate in another, deploy in a third” fragmentation problem. If you just need a simple phone automation script, it’s overkill.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.