← all repositories

ddupont808/GPT-4V-Act

Browser-based AI agent that uses GPT-4V's vision capabilities to autonomously control mouse and keyboard for web UI interaction and automation tasks.

1.1k stars JavaScript Agents
GPT-4V-Act
Velocity · 7d
+1.1
★ / day
Trend
steady
star history

GPT-4V-Act combines GPT-4V(ision) with a web browser to create an autonomous agent capable of interacting with web interfaces through low-level mouse and keyboard controls. The system employs Set-of-Mark Prompting with an auto-labeler that assigns numerical IDs to interactive UI elements, allowing the model to identify and target specific elements from screenshots. Given a task and screenshot input, the agent determines the next action and executes it using precise pixel coordinates, enabling workflow automation and automated UI testing.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.