ymcui/MacBERT
A Chinese pre-trained language model (MacBERT) that improves BERT-style pre-training with a corrected masked language model objective for reduced pretrain-finetune discrepancy.

MacBERT is a pre-trained Chinese language model developed by HFL (Hit-Fudan-Lab). It introduces a corrected masked language model (Mac) pre-training task that replaces [MASK] tokens with similar words based on n-gram matching rather than random substitution, reducing the gap between pre-training and fine-tuning. The model is compatible with Hugging Face Transformers and supports various Chinese NLP downstream tasks including text classification, named entity recognition, and question answering.