A real-estate giant's bet on Chinese-speaking LLMs
Lianjia (China's Zillow equivalent) open-sourced a full pipeline to turn LLaMA into a Mandarin instruction-follower, complete with training data, code, and a desktop chat app.

What it does BELLE is an end-to-end toolkit for building Chinese instruction-tuned language models. It ships training code (DeepSpeed-Chat, LoRA, full fine-tune), millions of generated Chinese instruction samples, evaluation sets, and quantized model weights. The project also includes a Flutter-based desktop app called ChatBELLE that runs a 4-bit quantized 7B model locally on macOS.
The interesting bit The project is backed by Lianjia—China’s largest real-estate brokerage—which gives it unusual corporate resources for an open-source NLP effort. To dodge LLaMA’s license restrictions, they distribute model “diffs” via XOR operations against the original weights, so only legitimate LLaMA licensees can reconstruct usable checkpoints. They also expanded LLaMA’s vocabulary with Chinese tokens and continued pre-training on 3.4 billion Chinese tokens before instruction tuning.
Key highlights
- Open-sourced training data totaling over 10M Chinese instruction samples (single-turn and multi-turn)
- RLHF training code supporting both PPO and DPO
- GPTQ quantization support and ZeRO Inference for memory-efficient serving
- BELLE-VL multimodal model scoring 1620.10 on MME perception benchmark (per their claim)
- Chinese-enhanced Whisper variants (v2, v3, v3-turbo) with claimed 24–65% accuracy gains
- Docker environments and Colab notebooks for quick starts
Caveats
- The ChatBELLE app is currently macOS-only, and the README explicitly warns that quantization causes “obvious quality loss”
- LLaMA-derived models are restricted to “research and study” per Meta’s license; the XOR diff mechanism is clever but adds friction
- Most training data is generated by ChatGPT, which raises familiar synthetic-data quality questions
Verdict Worth exploring if you need a proven Chinese LLM fine-tuning stack with data and code included. Skip if you want a polished consumer product or if LLaMA’s license terms block your use case.