zhaochen0110/Awesome_Think_With_Images
A curated collection of research papers and resources on LVLMs leveraging visual information for complex reasoning, planning, and generation.

Velocity · 7d
+3.9
★ / day
Trend
→steady
star history
This repository accompanies a survey paper on multimodal reasoning with images. It systematically curates research on how Large Vision-Language Models can use visual information as a dynamic cognitive workspace for reasoning, planning, and generation tasks. The collection is structured around key themes in the evolving field of multimodal AI.