← all repositories

NVlabs/prismer

Prismer is a vision-language model that uses pre-trained experts across multiple vision-language tasks including image captioning and visual question answering.

prismer
Velocity · 7d
+1.1
★ / day
Trend
steady
star history

Prismer implements a vision-language architecture combining multiple pre-trained expert models to handle diverse vision-language tasks. The model supports image captioning, visual question answering, and other multimodal tasks through a multi-task expert framework. It is built on PyTorch with Hugging Face accelerate for distributed multi-node multi-gpu training. A demo is available via HuggingFace Spaces.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.