Is MiniGPT-4 open source?

Yes — Vision-CAIR/MiniGPT-4 is open source, released under the BSD-3-Clause license.

What language is MiniGPT-4 written in?

Vision-CAIR/MiniGPT-4 is primarily written in Python.

How popular is MiniGPT-4?

Vision-CAIR/MiniGPT-4 has 25.7k stars on GitHub.

Where can I find MiniGPT-4?

Vision-CAIR/MiniGPT-4 is on GitHub at https://github.com/Vision-CAIR/MiniGPT-4.

← all repositories

Vision-CAIR/MiniGPT-4

Chat with images using a frozen LLM and a visual encoder

MiniGPT-4 bolts a vision encoder onto Vicuna or Llama 2 so the LLM can see, describe, and reason about images without retraining the language model itself.

★25.7k stars Python Language Models Image · Video · Audio

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does

MiniGPT-4 and its successor MiniGPT-v2 are vision-language models that let you upload an image and ask questions about it, get captions, or request creative tasks like writing poems or stories inspired by what the model sees. The architecture is deliberately simple: a frozen visual encoder (from BLIP-2) feeds into a frozen large language model (Vicuna 7B/13B or Llama 2 Chat 7B) with a lightweight projection layer trained to align the two. The repo includes training and fine-tuning code, evaluation scripts, and local Gradio demos.

The interesting bit

The “mini” framing is slightly cheeky — the LLM itself isn’t mini at all, it’s just not retrained. The actual training budget goes into that thin adapter between vision and text. MiniGPT-v2 extends this to a “unified interface” for multiple vision-language tasks, suggesting the same frozen LLM backbone can handle diverse prompts with task-specific token prefixes rather than separate fine-tuned heads.

Key highlights

Supports Vicuna 7B/13B and Llama 2 Chat 7B as the language backbone
Runs locally with ~11.5 GB VRAM for 7B models (8-bit), ~23 GB for 13B
Includes Colab notebook and Hugging Face Spaces demos for quick testing
Community has forked it for dermatology diagnosis (SkinGPT-4), art analysis (ArtGPT-4), patent figure captioning (PatFig), and instruction tuning
BSD 3-Clause license; built on top of Salesforce’s LAVIS framework

Caveats

Setup is manual: you must download LLM weights separately via git-lfs, download checkpoint files from Google Drive, and edit YAML config paths by hand
The README contains broken links (“Downlad” typo for Vicuna 13B) and inconsistent line number references that may drift
No quantitative benchmarks or accuracy claims are stated in the README; performance is left to the demo examples and linked papers

Verdict

Worth a look if you want a hackable, relatively lightweight vision-language baseline with clear training code. Skip it if you need a polished product or plug-and-play API — this is research code with research-code edges.

Frequently asked

What is Vision-CAIR/MiniGPT-4?: MiniGPT-4 bolts a vision encoder onto Vicuna or Llama 2 so the LLM can see, describe, and reason about images without retraining the language model itself.
Is MiniGPT-4 open source?: Yes — Vision-CAIR/MiniGPT-4 is open source, released under the BSD-3-Clause license.
What language is MiniGPT-4 written in?: Vision-CAIR/MiniGPT-4 is primarily written in Python.
How popular is MiniGPT-4?: Vision-CAIR/MiniGPT-4 has 25.7k stars on GitHub.
Where can I find MiniGPT-4?: Vision-CAIR/MiniGPT-4 is on GitHub at https://github.com/Vision-CAIR/MiniGPT-4.