← all repositories

CLUEbenchmark/CLUEPretrainedModels

Collection of Chinese pre-trained language models including BERT, ALBERT, and RoBERTa variants with distilation for efficiency.

812 stars Python Language ModelsData Tooling
CLUEPretrainedModels
Velocity · 7d
+0.3
★ / day
Trend
steady
star history

This repository provides a suite of Chinese pre-trained language models developed by CLUEbenchmark. It includes state-of-the-art large models, distilled small models achieving 8x speedup over Bert-base, and specialized semantic similarity models. The models are pre-trained on CLUECorpus2020, a 100GB Chinese corpus with 35 billion characters sourced from Common Crawl, using a compact 8K vocabulary that reduces computational cost while maintaining strong performance on Chinese NLP benchmarks.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.