Embedding/Chinese-Word-Vectors
A collection of over 100 pre-trained Chinese word vectors (embeddings) trained on different corpora with various representations.

This project provides pre-trained Chinese word embeddings in dense and sparse representations trained on diverse corpora including Wikipedia, news, and web text. It includes an analogical reasoning evaluation dataset (CA8) and a toolkit to assess embedding quality on semantic and morphological relations. The vectors are designed for direct use in downstream NLP tasks such as text classification, sentiment analysis, and machine translation.