← all repositories
hellonlp/sentiment-analysis

Four ways to guess if Chinese text is happy, from dictionaries to ALBERT

A practical survey of sentiment analysis techniques, from 1990s-style lexicons to BERT-era models, with working code for each.

529 stars Python Language ModelsML Frameworks
sentiment-analysis
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

What it does

This repo implements four distinct approaches to Chinese sentiment analysis: dictionary-based scoring, Naive Bayes, ALBERT+TextCNN, and a variant that learns emoji semantics as unknown tokens. Each gets its own subdirectory with runnable code. The README frames text classification as NLP’s foundational task — everything else is just classification in fancy clothes.

The interesting bit

The emoji-handling variant is the unusual one. Instead of stripping or ignoring emojis, it treats them as unknown tokens and learns their semantic vectors during fine-tuning. The README is vague on whether this actually helps — it just says the goal is “recognizing unknown token emotional semantics” — but the approach itself is a neat acknowledgment that informal text doesn’t cooperate with clean vocabularies.

Key highlights

  • Four implementations spanning rule-based, classical ML, and deep learning
  • ALBERT+TextCNN, not raw BERT — lighter, faster, good enough for this task
  • Emoji-aware variant handles out-of-vocabulary symbols via learned embeddings
  • All methods paired with Chinese-language Zhihu articles explaining the code
  • Python 3.7.6, TensorFlow-era tooling (no mention of PyTorch or modern versions)

Caveats

  • No benchmarks, accuracy numbers, or dataset details anywhere in the README
  • “ALBERT+TextCNN” appears twice with nearly identical descriptions; the emoji variant’s actual delta is barely explained
  • Dictionary and Bayes methods are stated as working but receive no performance discussion

Verdict

Useful if you need a quick comparative survey of sentiment analysis paradigms with working Chinese examples. Skip if you need production-ready metrics, modern framework support, or anything beyond the README’s hand-wavy claims about effectiveness.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.