Is nlp-in-practice open source?

Yes — kavgan/nlp-in-practice is an open-source project tracked on heatdrop.

What language is nlp-in-practice written in?

kavgan/nlp-in-practice is primarily written in Jupyter Notebook.

How popular is nlp-in-practice?

kavgan/nlp-in-practice has 1.2k stars on GitHub.

Where can I find nlp-in-practice?

kavgan/nlp-in-practice is on GitHub at https://github.com/kavgan/nlp-in-practice.

← all repositories

kavgan/nlp-in-practice

NLP recipes that skip the theory homework

A collection of runnable notebooks for the text-processing tasks you actually need to do at work.

★1.2k stars Jupyter Notebook Learning Data Tooling

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does This repo is a curated set of Jupyter notebooks and Python scripts covering bread-and-butter NLP tasks: TF-IDF keyword extraction, text classification with logistic regression, Word2Vec training with Gensim, preprocessing pipelines, and even a PySpark word-count for when your data stops fitting in RAM. Each entry links to a companion blog post by Kavita Ganesan.

The interesting bit The value isn’t novelty—it’s curation. The notebooks explicitly compare easily confused pairs (TFIDFTransformer vs. TFIDFVectorizer, HashingVectorizer vs. CountVectorizer, CBOW vs. SkipGram) that most tutorials gloss over. Think of it as a field guide to sklearn’s vectorizer zoo.

Key highlights

Runnable notebooks with datasets included where noted (word2vec, tf-idf, text classification)
Pre-trained embedding loading via Gensim (GloVe and Word2Vec) with a text-similarity example
PySpark phrase extraction and word count for larger-scale text
Preprocessing snippets covering stemming, lemmatization, noise removal, and stop-word removal
Each technique paired with an explanatory article, not just docstring regurgitation

Caveats

Some entries are external repos (phrase-at-scale, word_cloud) rather than in-tree code
The “more articles” and mailing-list links suggest this doubles as content marketing; the code appears genuine, but the funnel is visible

Verdict Worth bookmarking if you’re the “just show me a working example” type, especially for sklearn vectorizer gotchas. Skip it if you need deep learning (transformers, etc.) or production-grade pipelines—this is strictly classical NLP territory.

Frequently asked

What is kavgan/nlp-in-practice?: A collection of runnable notebooks for the text-processing tasks you actually need to do at work.
Is nlp-in-practice open source?: Yes — kavgan/nlp-in-practice is an open-source project tracked on heatdrop.
What language is nlp-in-practice written in?: kavgan/nlp-in-practice is primarily written in Jupyter Notebook.
How popular is nlp-in-practice?: kavgan/nlp-in-practice has 1.2k stars on GitHub.
Where can I find nlp-in-practice?: kavgan/nlp-in-practice is on GitHub at https://github.com/kavgan/nlp-in-practice.