← all repositories
Edward1Chou/SentimentAnalysis

LSTM sentiment analysis: when "not bad" finally makes sense

A Chinese-language notebook that tackles the classic sentiment-analysis trap—sarcasm and negation—by adding a third "neutral" class built from sentences with semantic pivots like "however" and "but."

877 stars Jupyter Notebook Language ModelsML Frameworks
SentimentAnalysis
Velocity · 7d
+0.3
★ / day
Trend
steady
star history

What it does

This is a Keras/LSTM notebook for Chinese text sentiment classification into positive, neutral, and negative. The author trains Word2Vec embeddings with jieba segmentation, feeds them through a single LSTM layer, and outputs softmax probabilities across three classes instead of the usual two.

The interesting bit

The neutral class isn’t hand-labeled by sentiment lexicons. The author extracts sentences containing pivot words like “然而” (however) and “但” (but), reasoning that semantic reversal creates a distinct emotional zone between pure praise and criticism. It’s a pragmatic heuristic—cheap to implement, and the README shows it actually catches phrases like “不是太好” (not too good) that binary classifiers mishandle.

Key highlights

  • Three-class output using softmax + categorical_crossentropy; labels are one-hot encoded with keras.utils.to_categorical
  • Word2Vec embeddings trained via Gensim on custom corpus, with jieba for Chinese segmentation
  • Neutral class bootstrapped from sentences containing adversative conjunctions
  • Single LSTM (50 units, tanh) with 0.5 dropout; model serialized to YAML + HDF5 weights
  • Author explicitly frames this as a baseline, not a production system

Caveats

  • Neutral predictions are sparse because the neutral dataset is “less than half” the size of the others and quality is uneven
  • Code shown uses Python 2 print syntax (print '...'), suggesting stale dependencies
  • No quantitative metrics, training curves, or reproducible dataset links provided

Verdict

Worth a skim if you’re building Chinese NLP baselines and need a concrete example of three-class LSTM setup in Keras. Skip if you want modern transformers, multilingual coverage, or battle-tested code; this is educational glue code from 2016-era deep learning, honestly labeled as such.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.