← all repositories
bfelbo/DeepMoji

A billion tweets taught this model to read emotions

DeepMoji learned sentiment, sarcasm, and emotion by predicting which emoji people would use—then turned that into a transfer-learning workhorse for NLP.

1.6k stars Python Language ModelsML Frameworks
DeepMoji
Velocity · 7d
+0.5
★ / day
Trend
steady
star history

What it does DeepMoji is a pretrained neural network trained on 1.2 billion tweets that contained emojis. The core trick: predict the emoji someone would use, and you end up with a model that understands emotional nuance in text. You can use it to extract 2304-dimensional “emotional feature vectors” from text, predict emojis directly, or fine-tune it for downstream tasks like sentiment analysis or sarcasm detection.

The interesting bit The emoji-as-supervision-signal is the clever part. Rather than hand-labeling emotions, the authors let Twitter users do the work by choosing emojis organically. The resulting representations transfer well across domains because emotional expression is surprisingly consistent—whether you’re subtweeting or writing a product review.

Key highlights

  • Pretrained on 1.2B emoji-bearing tweets; weights (~85MB) downloadable via included script
  • Outputs 2304-dim vectors usable as features for custom classifiers
  • Includes fine-tuning examples for transfer learning to new datasets
  • Keras-based with Theano or TensorFlow backend (this was 2017)
  • PyTorch reimplementation available as torchMoji from HuggingFace

Caveats

  • Explicitly built for Python 2.7; Python 3 requires community patches from open PRs
  • Online demo dead since September 2023 (expired certificate)
  • Dependencies are dated: Keras 2.0.x, TensorFlow 1.3+, Theano 0.9+
  • Authors note code “has not been optimized for efficiency” and offer no bug guarantees

Verdict Worth studying if you’re building emotion-aware NLP or curious about creative pretraining objectives. For production use, consider the HuggingFace torchMoji port instead—unless you enjoy dependency archaeology.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.