Is Chinese-CLIP open source?

Yes — OFA-Sys/Chinese-CLIP is open source, released under the MIT license.

What language is Chinese-CLIP written in?

OFA-Sys/Chinese-CLIP is primarily written in Jupyter Notebook.

How popular is Chinese-CLIP?

OFA-Sys/Chinese-CLIP has 6k stars on GitHub.

Where can I find Chinese-CLIP?

OFA-Sys/Chinese-CLIP is on GitHub at https://github.com/OFA-Sys/Chinese-CLIP.

← all repositories

OFA-Sys/Chinese-CLIP

Stop translating captions: a CLIP trained on 200M Chinese pairs

So Chinese image search can stop pretending hanzi are just pretty glyphs.

★6k stars Jupyter Notebook Language Models Image · Video · Audio

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does Chinese-CLIP rebuilds the CLIP pipeline for native Chinese understanding. It trains vision encoders—ResNet-50 through ViT-H/14—alongside Chinese RoBERTa text towers on roughly 200 million Chinese image-text pairs, producing joint embeddings for image-text similarity, cross-modal retrieval, and zero-shot classification.

The interesting bit Instead of forcing Chinese through an English-language text encoder, the project drops in Chinese-specific RoBERTa checkpoints and re-aligns them with vision backbones using the open_clip framework. The maintainers also treat deployment as a first-class concern, shipping export scripts for ONNX, TensorRT, and CoreML alongside pre-trained TensorRT weights.

Key highlights

Five scales from 77M to 958M parameters, pairing ResNet or ViT vision backbones with RBT3, RoBERTa-wwm-Base, or RoBERTa-wwm-Large text encoders.
Leads rival Chinese vision-language models (Wukong, R2D2, Taiyi) on zero-shot and fine-tuned retrieval across MUGE, Flickr30K-CN, and COCO-CN.
Zero-shot image classification on the ELEVATER benchmark using Chinese class labels.
Production exports available for ONNX, TensorRT, and CoreML; pre-trained TensorRT weights are provided.
FlashAttention support, gradient accumulation for large-batch simulation, and knowledge-distillation fine-tuning via ModelScope.
Model code and feature-extraction API are integrated into Hugging Face transformers.

Verdict Grab it if you need native Chinese image-text retrieval or zero-shot classification that actually understands hanzi. If your workload is strictly English, stick with vanilla CLIP or OpenCLIP.

Frequently asked

What is OFA-Sys/Chinese-CLIP?: So Chinese image search can stop pretending hanzi are just pretty glyphs.
Is Chinese-CLIP open source?: Yes — OFA-Sys/Chinese-CLIP is open source, released under the MIT license.
What language is Chinese-CLIP written in?: OFA-Sys/Chinese-CLIP is primarily written in Jupyter Notebook.
How popular is Chinese-CLIP?: OFA-Sys/Chinese-CLIP has 6k stars on GitHub.
Where can I find Chinese-CLIP?: OFA-Sys/Chinese-CLIP is on GitHub at https://github.com/OFA-Sys/Chinese-CLIP.