← all repositories
cardmagic/classifier

Ruby text classification that won't melt your laptop

Five algorithms, incremental LSI, and a CLI that actually works for multi-gigabyte datasets.

718 stars Ruby ML Frameworks
classifier
Velocity · 7d
+0.1
★ / day
Trend
steady
star history

What it does

A Ruby gem for text classification with five algorithms under one roof: Bayesian, logistic regression, LSI, k-NN, and TF-IDF. Ships with a CLI for pre-trained models (spam, sentiment, emotion detection) and streaming support for training on files too large to fit in memory.

The interesting bit

The LSI implementation uses Brand’s algorithm for incremental updates—no full SVD rebuild when you add documents. The README claims this is 400x faster for streaming data, and there’s a native C extension for the heavy linear algebra (5-50x speedup, though the benchmark table only shows results for 10 and 20 documents, which feels like a teaser rather than proof).

Key highlights

  • Five classifiers in one gem, with a consistent API across all of them
  • Pluggable persistence: file, Redis, S3, SQL, or roll your own
  • CLI with pre-trained models via Homebrew install
  • Claude Code plugin for AI-assisted classification workflows
  • Native C extension for LSI; pure Ruby fallback available

Caveats

  • The “400x faster” incremental LSI claim and the 5-50x speedup numbers lack detailed methodology or larger-scale benchmarks in the README
  • The performance table stops at 20 documents—how it scales beyond that is unclear
  • Most of the documentation lives on external sites (rubyclassifier.com), so offline usage may be limited

Verdict

Ruby developers doing text classification who need more than a naive Bayes toy and care about memory efficiency. Skip if you’re already invested in Python’s scikit-learn ecosystem or need deep neural network approaches.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.