ishan0102/vimGPT
An autonomous web-browsing agent powered by GPT-4V vision and Vimium keyboard navigation.

Velocity · 7d
+2.8
★ / day
Trend
→steady
star history
vimGPT enables multimodal LLMs like GPT-4V to interact with web browsers by using Vimium’s keyboard-based navigation system instead of relying on DOM text. The project captures screenshots, sends them to GPT-4V for visual analysis, and translates the model’s predictions into keyboard commands that Vimium can execute. A voice mode feature also allows users to give natural language objectives that the agent executes in real-time.