← all repositories

open-mmlab/Multimodal-GPT

Multi-modal chatbot that processes visual and language instructions, based on the OpenFlamingo vision-language model.

Multimodal-GPT
Velocity · 7d
+1.3
★ / day
Trend
steady
star history

This repository trains a multi-modal chatbot by fine-tuning the OpenFlamingo architecture on both visual and language instruction datasets. It creates training data across VQA, image captioning, visual reasoning, text OCR, and visual dialogue tasks. The project performs joint training of visual and language instructions to improve model performance on multi-modal instruction following.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.