← all repositories
louismullie/treat

A Ruby NLP toolkit that time forgot

Treat once aimed to be Ruby's answer to NLTK, but the repo now opens with a blunt warning: unmaintained.

1.4k stars Ruby ML Frameworks
treat
Velocity · 7d
+0.3
★ / day
Trend
steady
star history

What it does

Treat is a broad NLP framework for Ruby covering the usual suspects: tokenization, parsing, POS tagging, named entity recognition, keyword extraction, and document retrieval. It also bundles text extractors for PDF, Word, and even images via Ocropus, plus serialization to YAML, XML, or MongoDB. Think of it as a Swiss Army knife that tried to fold in every blade at once.

The interesting bit

The project’s real ambition was being “language- and algorithm-agnostic” — a tall order for any NLP library, let alone one in a language ecosystem better known for web frameworks than machine learning. It wrapped heavyweight external tools like Stanford NLP and Enju rather than reinventing them, which was pragmatic but also meant carrying their baggage.

Key highlights

  • Wraps Stanford & Enju parsers, WordNet, LIBLINEAR/LIBSVM, and Ferret search
  • Handles format extraction from PDF, HTML, Word, OpenOffice, and OCR’d images
  • Includes visualization modes: ASCII trees, DOT graphs, and standoff bracketing
  • Ships with ML primitives: decision trees, multilayer perceptrons
  • GPL-licensed with a mixed bag of dependencies (GPL, Apache 2.0, MIT, Ruby)

Caveats

  • Explicitly unmaintained; the author is actively seeking new maintainers
  • Heavy external dependencies may have rotted; no indication of current compatibility
  • “Several POS taggers for English” suggests the multilingual promise was uneven in practice

Verdict

Worth a look if you’re maintaining legacy Ruby NLP pipelines or researching how language-agnostic frameworks fail. Everyone else should probably treat this as a historical artifact and reach for Python’s ecosystem instead.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.