← all repositories

OFA-Sys/OFA

OFA is a unified sequence-to-sequence pretrained model that bridges vision, language, and cross-modal tasks including image captioning, VQA, and text-to-image generation.

OFA
Velocity · 7d
+1.6
★ / day
Trend
steady
star history

OFA (ICML 2022) is a multimodal foundation model unified through a sequence-to-sequence learning framework. It supports both English and Chinese and handles diverse tasks including image captioning (ranked 1st on MSCOCO leaderboard), visual question answering, visual grounding, text-to-image synthesis, and text/image classification. The repository provides pretrained checkpoints, step-by-step pretraining and finetuning instructions, and supports both standard finetuning and prompt tuning approaches.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.