Yes — OFA-Sys/OFA is open source, released under the Apache-2.0 license.

What language is OFA written in?

OFA-Sys/OFA is primarily written in Python.

OFA-Sys/OFA has 2.6k stars on GitHub.

Where can I find OFA?

OFA-Sys/OFA is on GitHub at https://github.com/OFA-Sys/OFA.

OFA-Sys/OFA

A Single Seq2seq Model for Vision, Language, and Both

OFA treats image captioning, visual question answering, and text-to-image synthesis as the same sequence-to-sequence problem, using one pretrained architecture for all of them.

★2.6k stars Python Language Models Image · Video · Audio ML Frameworks

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

OFA is a unified pretrained transformer that reframes multimodal tasks as plain sequence-to-sequence problems. It handles vision-only, language-only, and cross-modal workloads—image captioning, visual question answering, visual grounding, text-to-image generation, and standard text or image classification—using a single architecture. The project ships pretrained checkpoints ranging from 33 million to 930 million parameters, plus finetuned weights for specific tasks.

The interesting bit

Instead of building separate encoders or decoders for different modalities, OFA flattens images and text into the same token stream and learns one set of rules for all of them. The authors report that this unified approach took first place on the MSCOCO image captioning leaderboard and posts competitive scores on VQA and RefCOCO benchmarks.

Key highlights

Supports both English and Chinese, with dedicated extensions for OCR and speech pretraining (OFA-OCR and MMSpeech).
Five model sizes from Tiny to Huge, so you can trade accuracy for speed without changing the architecture.
Prompt tuning and standard finetuning are both supported, which lets you adapt the model without necessarily updating all 930M parameters.
Hugging Face Transformers integration is available for inference, alongside the original Fairseq-based training code.
Released at ICML 2022, with continued updates including MuE for faster inference.

Caveats

The repository is organized as a collection of task-specific checkpoints and scripts; treating it as a monolithic toolkit requires navigating several markdown files and branches.
Chinese support and some extensions live in separate docs and need different tokenizer configurations.

Verdict

Researchers and engineers who want one pretrained backbone for mixed vision-language workloads should look here; if you only need a single-task model, the overhead of a 930M-parameter generalist may not be worth it.

Frequently asked

What is OFA-Sys/OFA?: OFA treats image captioning, visual question answering, and text-to-image synthesis as the same sequence-to-sequence problem, using one pretrained architecture for all of them.
Is OFA open source?: Yes — OFA-Sys/OFA is open source, released under the Apache-2.0 license.
What language is OFA written in?: OFA-Sys/OFA is primarily written in Python.
How popular is OFA?: OFA-Sys/OFA has 2.6k stars on GitHub.
Where can I find OFA?: OFA-Sys/OFA is on GitHub at https://github.com/OFA-Sys/OFA.