YingqingHe/Awesome-LLMs-meet-Multimodal-Generation
A curated survey of research papers on large language models enabling generation across visual and audio modalities.

Velocity · 7d
+0.6
★ / day
Trend
→steady
star history
This repository maintains a structured compilation of academic papers exploring how LLMs enable multimodal generation across images, videos, 3D content, and audio. It categorizes research by modality and by whether approaches are LLM-based or use alternatives like CLIP/T5. The collection serves as a reference resource for understanding the intersection of language models and generative AI across different media types.