OpenBMB/AgentCPM-GUI
An on-device GUI agent based on MiniCPM-V that autonomously operates Android apps using smartphone screenshots as input.

AgentCPM-GUI is an open-source 8-billion-parameter vision-language agent jointly developed by THUNLP, Renmin University of China, and ModelBest. Built on MiniCPM-V, it accepts smartphone screenshots as input and autonomously executes user-specified tasks on Android apps. The system uses reinforcement fine-tuning (RFT) to enhance planning and reasoning capabilities, enabling the model to think before outputting actions. It supports both Chinese and English apps with an optimized action space using concise JSON format, reducing average action length to 9.7 tokens for efficient on-device inference.