MinorJerry/WebVoyager
A web agent powered by Large Multimodal Models that autonomously navigates and interacts with real-world websites end-to-end.

WebVoyager implements an end-to-end web browsing agent using Large Multimodal Models that integrate textual and visual information to complete user tasks on websites. The system uses Selenium to create an online web browsing environment where the LMM agent can perceive pages and take actions. It includes an automated evaluation protocol using GPT-4V and provides a diverse dataset of 643 task queries across 15 websites to benchmark the agent’s performance.