JindongGu/Awesome-Prompting-on-Vision-Language-Model
A survey paper and organized paper list covering prompt engineering techniques across three types of vision-language foundation models.

This repository provides a comprehensive survey of prompt engineering research on vision-language models. It catalogs papers across three VLM categories: multimodal-to-text generation models (e.g., Flamingo), image-text matching models (e.g., CLIP), and text-to-image generation models (e.g., Stable Diffusion). The content is organized by topic and serves as a reference resource for researchers studying VLMs and prompting techniques.