Is Qwen-VL open source?

Yes — QwenLM/Qwen-VL is an open-source project tracked on heatdrop.

What language is Qwen-VL written in?

QwenLM/Qwen-VL is primarily written in Python.

How popular is Qwen-VL?

QwenLM/Qwen-VL has 6.7k stars on GitHub.

Where can I find Qwen-VL?

QwenLM/Qwen-VL is on GitHub at https://github.com/QwenLM/Qwen-VL.

← all repositories

QwenLM/Qwen-VL

Open vision-language weights that speak Chinese and draw boxes

To ship open-weight vision-language models that natively read images, draw bounding boxes, and chat in Chinese and English.

★6.7k stars Python Language Models Image · Video · Audio

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does Qwen-VL is a family of vision-language models from Alibaba Cloud that ingests images, text, and bounding boxes and emits text and bounding boxes. The open releases include a base pretrained model and an aligned chat variant; there are also higher-tier upgrades (Plus and Max) that handle megapixel images and extreme aspect ratios, though those seem to be gated behind APIs and web demos.

The interesting bit The architecture is refreshingly explicit: a Qwen-7B backbone, an OpenCLIP ViT-bigG visual encoder, and a randomly initialized cross-attention layer connecting them. More unusual is native bilingual grounding—you can prompt the model in Chinese or English to locate objects, and it returns actual bounding boxes without calling a separate detection pipeline.

Key highlights

Open weights for pretrained and chat-tuned variants, including an Int4 release aimed at modest GPU memory.
Supports multi-image interleaved conversations and multi-round dialogue rather than one-off image prompts.
Claims state-of-the-art results among similarly sized open vision-language models on English benchmarks covering captioning, visual QA, document QA, and grounding.
Plus and Max variants reportedly match or beat GPT-4V and Gemini Ultra on several document and chart tasks, but appear to be API-only.
Fine-tuning is supported through full-parameter training, LoRA, and Q-LoRA.

Caveats

The README is heavy on benchmark tables and emoji, but sparse on training data composition and licensing details for the open weights.
It is unclear whether Plus and Max model weights are openly downloadable or restricted to Alibaba’s API and Hugging Face Spaces.

Verdict A solid candidate if you need open weights with strong Chinese vision-language support and built-in spatial grounding. Look elsewhere if your priority is fully open, reproducible megapixel-scale models—those appear to be API-only.

Frequently asked

What is QwenLM/Qwen-VL?: To ship open-weight vision-language models that natively read images, draw bounding boxes, and chat in Chinese and English.
Is Qwen-VL open source?: Yes — QwenLM/Qwen-VL is an open-source project tracked on heatdrop.
What language is Qwen-VL written in?: QwenLM/Qwen-VL is primarily written in Python.
How popular is Qwen-VL?: QwenLM/Qwen-VL has 6.7k stars on GitHub.
Where can I find Qwen-VL?: QwenLM/Qwen-VL is on GitHub at https://github.com/QwenLM/Qwen-VL.