OSU-NLP-Group/SeeAct
SeeAct is a generalist web agent system that uses large multimodal models like GPT-4V to autonomously execute tasks on any website.

Velocity · 7d
+0.9
★ / day
Trend
→steady
star history
SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large multimodal models (LMMs) such as GPT-4V(ision). It provides a robust codebase supporting running web agent experiments and includes the Multimodal-Mind2Web benchmark for evaluation. The system enables agents to ground visual understanding with web interactions for executing user-specified tasks.