← all repositories

ishan0102/vimGPT

An autonomous web-browsing agent powered by GPT-4V vision and Vimium keyboard navigation.

2.7k stars Python AgentsLanguage Models
vimGPT
Velocity · 7d
+2.8
★ / day
Trend
steady
star history

vimGPT enables multimodal LLMs like GPT-4V to interact with web browsers by using Vimium’s keyboard-based navigation system instead of relying on DOM text. The project captures screenshots, sends them to GPT-4V for visual analysis, and translates the model’s predictions into keyboard commands that Vimium can execute. A voice mode feature also allows users to give natural language objectives that the agent executes in real-time.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.