← all repositories
jiegzhan/multi-class-text-classification-cnn-rnn

TensorFlow crime classifier: 39 labels, three architectures, one dataset

A straightforward comparison of CNN, LSTM, and GRU on the same Kaggle text classification task, circa 2016-style TensorFlow.

603 stars Python ML Frameworks
multi-class-text-classification-cnn-rnn
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

What it does

Trains and runs three neural architectures—CNN, LSTM, and GRU—on a single multi-class text classification problem: predicting crime categories from free-text San Francisco police incident descriptions. The repo provides train.py and predict.py scripts that take a CSV and a JSON config, nothing more exotic than that.

The interesting bit

This is essentially a controlled bake-off between architectures on the same dataset, which is rarer than it should be. The README cites WildML’s 2015 CNN tutorial as its direct ancestor, so the value is in seeing how RNN variants stack up against that baseline on a real-world, many-class problem.

Key highlights

  • 39 output classes from short text descriptions (“GRAND THEFT FROM LOCKED AUTO” → LARCENY/THEFT)
  • Three model types in one repo: CNN, LSTM, and GRU, all with word embeddings
  • Clean separation: train.py for fitting, predict.py for inference against a timestamped results directory
  • Uses the Kaggle SF Crime dataset’s Descript field as input, Category as label
  • TensorFlow implementation, likely TF 1.x era based on the 2016-2017 reference lineage

Caveats

  • No performance numbers, convergence curves, or architecture comparisons are actually reported in the README—you’ll have to run it yourself
  • The code appears to be TF 1.x-style; modern TensorFlow users should expect friction
  • No tests, no requirements.txt, no CI; this is research-grade glue code

Verdict

Worth a look if you’re teaching or learning how CNN vs. RNN text classifiers differ in practice, or need a quick baseline for a multi-class NLP problem. Skip it if you want production-ready tooling or pre-computed benchmarks to cite.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.