Four ways to guess if Chinese text is happy, from dictionaries to ALBERT
A practical survey of sentiment analysis techniques, from 1990s-style lexicons to BERT-era models, with working code for each.

What it does
This repo implements four distinct approaches to Chinese sentiment analysis: dictionary-based scoring, Naive Bayes, ALBERT+TextCNN, and a variant that learns emoji semantics as unknown tokens. Each gets its own subdirectory with runnable code. The README frames text classification as NLP’s foundational task — everything else is just classification in fancy clothes.
The interesting bit
The emoji-handling variant is the unusual one. Instead of stripping or ignoring emojis, it treats them as unknown tokens and learns their semantic vectors during fine-tuning. The README is vague on whether this actually helps — it just says the goal is “recognizing unknown token emotional semantics” — but the approach itself is a neat acknowledgment that informal text doesn’t cooperate with clean vocabularies.
Key highlights
- Four implementations spanning rule-based, classical ML, and deep learning
- ALBERT+TextCNN, not raw BERT — lighter, faster, good enough for this task
- Emoji-aware variant handles out-of-vocabulary symbols via learned embeddings
- All methods paired with Chinese-language Zhihu articles explaining the code
- Python 3.7.6, TensorFlow-era tooling (no mention of PyTorch or modern versions)
Caveats
- No benchmarks, accuracy numbers, or dataset details anywhere in the README
- “ALBERT+TextCNN” appears twice with nearly identical descriptions; the emoji variant’s actual delta is barely explained
- Dictionary and Bayes methods are stated as working but receive no performance discussion
Verdict
Useful if you need a quick comparative survey of sentiment analysis paradigms with working Chinese examples. Skip if you need production-ready metrics, modern framework support, or anything beyond the README’s hand-wavy claims about effectiveness.