← all repositories

Embedding/Chinese-Word-Vectors

A collection of over 100 pre-trained Chinese word vectors (embeddings) trained on different corpora with various representations.

12.2k stars Python Language ModelsData Tooling
Chinese-Word-Vectors
Velocity · 7d
+4.0
★ / day
Trend
steady
star history

This project provides pre-trained Chinese word embeddings in dense and sparse representations trained on diverse corpora including Wikipedia, news, and web text. It includes an analogical reasoning evaluation dataset (CA8) and a toolkit to assess embedding quality on semantic and morphological relations. The vectors are designed for direct use in downstream NLP tasks such as text classification, sentiment analysis, and machine translation.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.