inclusionAI/UI-Venus
A native UI agent for precise GUI element grounding and navigation using screenshot-based multimodal LLMs.

UI-Venus is a unified, end-to-end GUI agent that performs precise GUI element grounding and effective navigation using only screenshots as input. The system includes 2B/8B dense and 30B-A3B MoE model variants trained with mid-stage training on 10B tokens across 30+ datasets and online reinforcement learning for long-horizon navigation. It achieves state-of-the-art performance on benchmarks including ScreenSpot-Pro (69.6%), VenusBench-GD (75.0%), and AndroidWorld (77.6%), with demonstrated navigation across 40+ Chinese mobile applications.