← all repositories

OSU-NLP-Group/SeeAct

SeeAct is a generalist web agent system that uses large multimodal models like GPT-4V to autonomously execute tasks on any website.

845 stars Python Agents
SeeAct
Velocity · 7d
+0.9
★ / day
Trend
steady
star history

SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large multimodal models (LMMs) such as GPT-4V(ision). It provides a robust codebase supporting running web agent experiments and includes the Multimodal-Mind2Web benchmark for evaluation. The system enables agents to ground visual understanding with web interactions for executing user-specified tasks.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.