cvlab-columbia/viper
Research implementation of a visual reasoning system where an LLM generates and executes Python code to answer questions about images.

Velocity · 7d
+1.5
★ / day
Trend
→steady
star history
ViperGPT combines visual understanding with language model reasoning by using an LLM to generate Python code that can process images and perform reasoning tasks. The system executes the generated code and returns results, enabling complex visual question answering. It integrates with GLIP for vision capabilities and uses OpenAI models as the code generation backbone, allowing arbitrary Python operations for visual inference.