zai-org/CogAgent
An open-sourced Vision Language Model-based agent that autonomously perceives and acts on graphical user interfaces to automate computer tasks.

Velocity · 7d
+1.3
★ / day
Trend
→steady
star history
CogAgent is an end-to-end GUI agent built on vision-language models. It perceives screen content, performs reasoning, and executes actions to automate computer tasks across arbitrary graphical interfaces. The project provides model weights, inference code, and training pipelines for developing autonomous agents capable of computer use.