showlab/Awesome-GUI-Agent
A curated list of papers, projects, and benchmarks for multi-modal Graphical User Interface (GUI) agents that use LLMs to perceive screens and automate tasks.

This repository aggregates research papers, datasets, benchmarks, and open-source projects related to GUI agents—autonomous AI systems that use multi-modal language models to understand screen content and execute user interface actions. It serves as a central reference for the GUI agent research community, covering topics like browser automation, desktop control, task-oriented agents, and evaluation benchmarks for agent performance on graphical interfaces.