← all repositories
RandolphVI/Multi-Label-Text-Classification

TensorFlow 1.x multi-label text classifiers, warts and all

A research-grade collection of neural architectures for tagging text with multiple labels, frozen in time circa 2019.

562 stars Python ML FrameworksLanguage Models
Multi-Label-Text-Classification
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

What it does

Implements several neural architectures—FastText, CNN, RNN, CRNN, RCNN, HAN, plus a couple of “personal ideas”—for multi-label text classification. You feed it tokenized text and a binary label vector like [0, 1, 0, ..., 1, 1]; it predicts which categories apply simultaneously. Supports both English (nltk) and Chinese (jieba) tokenization, and can load pre-trained word vectors via gensim.

The interesting bit

The author treats this as a learning project, and the pedagogical care shows: gradient clipping, L2 loss done correctly, learning-rate decay, batch normalization, a custom checkpoint manager that saves the best N models instead of just the last N, and TensorBoard embedding visualization. Several models are flagged “can use but not finished yet” with cheerful emojis, which is more honesty than most repos manage.

Key highlights

  • Seven architectures in one codebase, from bag-of-words FastText to Hierarchical Attention Networks
  • Proper training conveniences: checkpoint restoration, threshold-based or top-K prediction, AUC/AUPRC metrics, structured logging
  • Chinese/English bilingual preprocessing pipeline built in
  • Pre-trained word vector support (Word2Vec included, GloVe and FastText pre-training also documented)

Caveats

  • Locked to TensorFlow 1.15.0 and Python 3.6; this is legacy code by modern standards
  • TextRNN and TextSANN are explicitly marked unfinished with TODOs remaining
  • The “personal ideas” models (TextANN, TextCRNN, TextRCNN) lack published references—treat as experimental

Verdict

Worth a look if you’re teaching or learning multi-label classification and want a single repo with multiple baselines to compare. Skip it if you need production-ready, maintained code—this is a snapshot of 2019 research tooling, not a living framework.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.