Is textacy open source?

Yes — chartbeat-labs/textacy is an open-source project tracked on heatdrop.

What language is textacy written in?

chartbeat-labs/textacy is primarily written in Python.

How popular is textacy?

chartbeat-labs/textacy has 2.2k stars on GitHub.

Where can I find textacy?

chartbeat-labs/textacy is on GitHub at https://github.com/chartbeat-labs/textacy.

← all repositories

chartbeat-labs/textacy

The spaCy sidekick that cleans your text and counts your Flesch-Kincaid

A Python library for the NLP grunt work that happens before tokenization and after parsing.

★2.2k stars Python ML Frameworks Language Models Data Tooling

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does

textacy wraps spaCy with higher-level helpers for the parts of an NLP pipeline that spaCy doesn’t touch: loading datasets, cleaning raw text, extracting structured info like keyterms and SVO triples, building topic models, and computing readability scores. It’s essentially the glue and utility layer between “we have text” and “we have vectors.”

The interesting bit

The library ships with ready-made datasets — Congressional speeches, historical literature, Reddit comments — which is rarer than it should be in NLP tooling. It also extends spaCy’s Doc objects with custom methods, so you can call .to_bag_of_terms() or .to_semantic_network() directly on a parsed document rather than juggling converters yourself.

Key highlights

Pre- and post-processing around spaCy: cleaning, normalization, n-gram extraction, acronym detection, keyterm ranking
Built-in datasets with metadata (no scraping required)
Topic modeling pipeline: tokenization, vectorization, training, visualization
Readability and lexical diversity stats, including multilingual Flesch Reading Ease
String/sequence similarity metrics beyond what spaCy provides

Caveats

The README is light on specifics: no version requirements, no performance notes, no comparison to alternatives like spacy-transformers or gensim
“…and much more!” suggests breadth over depth; you’ll need to dig into the docs to see what’s actually well-supported

Verdict

Worth a look if you’re building spaCy-based pipelines and tired of rewriting text-cleaning boilerplate. Skip it if you need cutting-edge neural models or fine-grained control over every preprocessing step — this is convenience tooling, not research infrastructure.

Frequently asked

What is chartbeat-labs/textacy?: A Python library for the NLP grunt work that happens before tokenization and after parsing.
Is textacy open source?: Yes — chartbeat-labs/textacy is an open-source project tracked on heatdrop.
What language is textacy written in?: chartbeat-labs/textacy is primarily written in Python.
How popular is textacy?: chartbeat-labs/textacy has 2.2k stars on GitHub.
Where can I find textacy?: chartbeat-labs/textacy is on GitHub at https://github.com/chartbeat-labs/textacy.