← all repositories
totalgood/nlpia

The NLP book repo that wants you to build socially responsible bots

Companion code for "NLP in Action" that treats ethics as a feature, not a footnote.

nlpia
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

What it does NLPIA is the community-driven code companion for the Manning book Natural Language Processing in Action. It bundles examples, a pip-installable nlpia package, and some half-built utilities for glossary compilation, semantic search, and “wordogram” spectrograms. The README doubles as a survival guide for Windows Python developers who need C++ compilers and winpty aliases just to get Jupyter running.

The interesting bit The project explicitly frames itself around “socially responsible NLP pipelines that give back to the communities they interact with” — a stance that stands out in a field where ethics is usually relegated to a final chapter. The “semantic spectrograms” feature is genuinely playful: it renders word2vec embeddings as visual 2D arrays you can feed into image-processing algorithms.

Key highlights

  • Installable via conda, pip, or Docker (with a community-contributed Dockerfile)
  • Includes data loaders for word2vec, Google Universal Sentence Encoder, and ANKI language pairs
  • Skeleton APIs for acronym extraction and glossary generation from AsciiDoc sources
  • Chatbot and voice (TTS/STT) modules, though these are noted as potentially broken on Windows due to pycrypto issues
  • The README contains a small sermon against VSCode’s data-slurping EULA and a plug for Sublime Text 3

Caveats

  • Several “features” are little more than docstring stubs in transcoders.py — the glossary compiler and semantic search are aspirational
  • Windows support is described as fragile, with explicit warnings about missing compilers and pycrypto incompatibilities
  • The project appears lightly maintained; Travis CI and Codecov badges suggest older tooling

Verdict Worth cloning if you’re working through the book or want a starting point for embedding-based text visualization. Skip it if you need a production NLP framework — this is teaching material with some rough edges and strong opinions about editors.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.