Kim's 2014 CNN, still kicking in Chinese
A faithful PyTorch port of the classic TextCNN paper, wired for Chinese sentiment analysis with jieba and Zhihu word vectors.

What it does
Implements the four embedding variants from Yoon Kim’s 2014 sentence-classification paper—random, static, non-static, and multichannel—then trains them on a Chinese text corpus for sentiment classification. Tokenization is handled by jieba; word vectors come from a Zhihu QA-trained Word2vec model via the Chinese-Word-Vectors project.
The interesting bit
The README is essentially a lab notebook: every variant has a concrete accuracy number, and the progression is clean. Random initialization hits 94%, frozen pretrained vectors jump to 95%, and fine-tuning the embeddings nudges it to 96%. The multichannel trick (static + fine-tuned side by side) matches fine-tuning alone, which is itself a useful data point.
Key highlights
- Four Kim CNN variants in one script, toggled by CLI flags (
-static,-non-static,-multichannel) - Pretrained Chinese word vectors from a real social-QA corpus, not generic news text
- Early stopping with a 1000-step patience baked in
- Dependencies pinned to PyTorch 1.0.0 and torchtext 0.3.1—archaeologically precise
Caveats
- No mention of what the 7,000-sample evaluation dataset actually is
- PyTorch 1.0.0 and torchtext 0.3.1 are years out of date; expect dependency archaeology to run it today
- No code structure or module breakdown shown in the README
Verdict
Worth a look if you need a minimal, working TextCNN baseline for Chinese text and don’t mind updating the dependency stack. Skip it if you want modern transformers, production-grade logging, or any explanation of the training data’s provenance.