← all repositories

zai-org/CogAgent

An open-sourced Vision Language Model-based agent that autonomously perceives and acts on graphical user interfaces to automate computer tasks.

1.2k stars Python AgentsLanguage Models
CogAgent
Velocity · 7d
+1.3
★ / day
Trend
steady
star history

CogAgent is an end-to-end GUI agent built on vision-language models. It perceives screen content, performs reasoning, and executes actions to automate computer tasks across arbitrary graphical interfaces. The project provides model weights, inference code, and training pipelines for developing autonomous agents capable of computer use.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.