← all repositories

jingyi0000/VLM_survey

A systematic survey of vision-language models applied to visual recognition tasks including classification, detection, and segmentation.

VLM_survey
Velocity · 7d
+2.7
★ / day
Trend
steady
star history

This repository hosts a comprehensive survey of Vision-Language Models (VLMs) compiled as an academic resource. It catalogs VLM studies across various visual recognition tasks such as image classification, object detection, and semantic segmentation. The survey, published in IEEE TPAMI 2024, serves as an curated awesome list of research papers in the multi-modal/VLM space.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.