jingyi0000/VLM_survey
A systematic survey of vision-language models applied to visual recognition tasks including classification, detection, and segmentation.

Velocity · 7d
+2.7
★ / day
Trend
→steady
star history
This repository hosts a comprehensive survey of Vision-Language Models (VLMs) compiled as an academic resource. It catalogs VLM studies across various visual recognition tasks such as image classification, object detection, and semantic segmentation. The survey, published in IEEE TPAMI 2024, serves as an curated awesome list of research papers in the multi-modal/VLM space.