Is InternVL open source?

Yes — OpenGVLab/InternVL is open source, released under the MIT license.

What language is InternVL written in?

OpenGVLab/InternVL is primarily written in Python.

How popular is InternVL?

OpenGVLab/InternVL has 10.1k stars on GitHub.

Where can I find InternVL?

OpenGVLab/InternVL is on GitHub at https://github.com/OpenGVLab/InternVL.

← all repositories

OpenGVLab/InternVL

An Open-Source Multimodal Family Benchmarking Itself Against GPT-4o

InternVL exists to prove that open-source vision-language models can match closed-source commercial performance on perception, reasoning, and document understanding benchmarks.

★10.1k stars Python Language Models Image · Video · Audio

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does

InternVL is a continuously evolving suite of multimodal large language models ranging from 1B to 241B parameters. The project releases model weights, training scripts, data pipelines, and evaluation benchmarks for vision-language tasks including image understanding, video analysis, document QA, and mathematical reasoning. It also publishes companion datasets like MMPR and VisualPRM400K for preference optimization and process reward modeling.

The interesting bit

Rather than just dropping model checkpoints, the team open-sources the entire training stack—including offline and online RL stages, data construction pipelines, and even a 20B parameter variant built on the GPT-OSS architecture. The latest InternVL3.5-241B-A28B claims state-of-the-art results among open-source MLLMs, while the Mini-InternVL series aims to deliver 90% of that capability at 5% of the size.

Key highlights

Model scale ranges from 1B to 241B parameters, with the largest variant (InternVL3.5-241B-A28B) claiming top open-source scores on general multimodal and reasoning leaderboards.
InternVL2.5-78B was the first open-source MLLM to exceed 70% on the MMMU benchmark, which the team presents as matching leading closed-source models like GPT-4o.
The project open-sources not just weights but training code for preference optimization (MPO), process reward models (VisualPRM), and CascadeRL offline/online RL stages.
A 20B-A4B variant is built on the GPT-OSS architecture and offered in both the project’s native format and standard Hugging Face transformers format.
Mini-InternVL compresses capability into smaller footprints: the 4B model reportedly delivers 90% of the performance using 5% of the parameters.

Caveats

Benchmark comparisons to GPT-4o and GPT-5 are authored by the InternVL team itself; the README does not point to third-party audits.
The documentation is structured as a chronological news feed, so finding specific architectural or training details requires digging through linked papers and blog posts.
With versions spanning 1.0 to 3.5 and parameter counts from 1B to 241B, the model matrix is dense enough to confuse even attentive readers.

Verdict

Worth exploring if you need open-source vision-language weights with fully exposed training pipelines and datasets. Skip it if you are looking for a single, stable, well-documented API endpoint rather than a fast-moving research codebase.

Frequently asked

What is OpenGVLab/InternVL?: InternVL exists to prove that open-source vision-language models can match closed-source commercial performance on perception, reasoning, and document understanding benchmarks.
Is InternVL open source?: Yes — OpenGVLab/InternVL is open source, released under the MIT license.
What language is InternVL written in?: OpenGVLab/InternVL is primarily written in Python.
How popular is InternVL?: OpenGVLab/InternVL has 10.1k stars on GitHub.
Where can I find InternVL?: OpenGVLab/InternVL is on GitHub at https://github.com/OpenGVLab/InternVL.