← all repositories
OpenBMB/VisCPM

A bilingual multimodal model that learned Chinese vision skills from English data

VisCPM shows how a strong bilingual LLM base can bootstrap multimodal capabilities across languages without parallel training data.

VisCPM
Velocity · 7d
+1.0
★ / day
Trend
steady
star history

What it does

VisCPM is a family of open-source multimodal models built on the 10B-parameter CPM-Bee language model. It comes in two flavors: VisCPM-Chat for image-to-text conversation, and VisCPM-Paint for text-to-image generation. Both handle Chinese and English, and the project claims state-of-the-art results among Chinese open-source multimodal models on LLaVA-style benchmarks.

The interesting bit

The project demonstrates cross-lingual transfer for multimodal tasks. Because CPM-Bee is already bilingual, VisCPM can be pre-trained on English image-text pairs only, then generalize to Chinese multimodal capabilities with minimal translated instruction data. The README notes that even English-only instruction tuning produced a model that understood Chinese questions but answered in English — a neat diagnostic of where capabilities live in the stack.

Key highlights

  • Two model variants: VisCPM-Chat-balance (balanced bilingual) and VisCPM-Chat-zhplus (Chinese-optimized, with extra native and translated Chinese pre-training data)
  • Low-resource inference down to 5GB VRAM for the chat model
  • HuggingFace integration, local web demo deployment, and an API for VisCPM-Chat
  • ICLR 2024 spotlight paper (top 5%)
  • Predecessor to the newer OmniLMM and MiniCPM-V model families from the same team

Caveats

  • The project appears to be in maintenance mode; the README prominently directs users to newer successors (OmniLMM, MiniCPM-V 2.0)
  • Benchmark tables in the README are truncated, so full comparative claims are hard to verify from the source alone

Verdict

Worth studying if you’re researching cross-lingual multimodal transfer or need a working Chinese-English vision-language baseline. Skip if you want the latest hardware — the team has already shipped smaller, more capable successors.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.