huawei-noah/Pretrained-Language-Model
Collection of pretrained Chinese language models (PanGu-α up to 200B parameters, NEZHA, GPT variants) and compression techniques from Huawei Noah's Ark Lab.

This repository provides pretrained language models and related optimization techniques developed by Huawei Noah’s Ark Lab. Models include PanGu-α (large-scale autoregressive model up to 200B parameters), NEZHA (TensorFlow and PyTorch variants), and GPT-based generators for Chinese classical poetry. Optimization techniques include knowledge distillation and model compression via TinyBERT and DynaBERT, byte-level vocabulary tools (BBPE), and probabilistically masked language modeling (PMLM).