Ruby's quiet answer to "but Python has NLTK"
A curated index of NLP libraries, bindings, and resources for a language better known for web frameworks than tokenizers.

What it does This is an awesome-list: a hand-maintained index of Ruby libraries and tools for natural-language processing, from tokenizers and stemmers to spaCy wrappers and Google Cloud Language clients. It maps the full NLP pipeline—segmentation, parsing, semantic analysis, sentiment detection—onto Ruby gems you can actually install today.
The interesting bit
The list is opinionated enough to distinguish stemming from lemmatization and to flag which tools are pure Ruby versus FFI bindings or JRuby wrappers. That granularity saves you from discovering dependency hell after bundle install.
Key highlights
- Covers the full stack: pipeline orchestration, multipurpose engines (OpenNLP, Stanford CoreNLP, spaCy via PyCall), and granular subtasks like stop-word filtering and constituency parsing.
- Includes online API clients (Wit.ai, MonkeyLearn, Google Cloud Language) alongside native gems.
- Maintained by someone doing “day to day work on Language Models and NLP Tools,” not just drive-by curation.
- Links to sibling lists for Ruby ML and data science if you need to leave the text-processing bubble.
- Actively solicits contributions; the Tutorials section is currently empty and begging for help.
Caveats
- Several listed projects look dormant or are legacy bindings (e.g., AlchemyAPI is labeled “Legacy”).
- The README is truncated in the source, so coverage of higher-level tasks like NER and chatbots is incomplete in what we can verify.
- “Please help us to fill out this section!” appears more than once—this is a living document with gaps.
Verdict Worth bookmarking if you’re committed to Ruby for NLP or maintaining a polyglot stack. If you’re starting fresh and choosing a language for text processing, this list is more “here’s how to cope” than “here’s why Ruby wins.”