← all repositories

cvlab-columbia/viper

Research implementation of a visual reasoning system where an LLM generates and executes Python code to answer questions about images.

1.7k stars Jupyter Notebook AgentsLanguage ModelsImage · Video · Audio
viper
Velocity · 7d
+1.5
★ / day
Trend
steady
star history

ViperGPT combines visual understanding with language model reasoning by using an LLM to generate Python code that can process images and perform reasoning tasks. The system executes the generated code and returns results, enabling complex visual question answering. It integrates with GLIP for vision capabilities and uses OpenAI models as the code generation backbone, allowing arbitrary Python operations for visual inference.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.