PageRank for your prose: a spaCy pipeline that ranks phrases by graph
Implements TextRank and three variants as a drop-in spaCy component for extractive NLP tasks.

What it does
PyTextRank is a spaCy pipeline extension that runs graph-based ranking algorithms over a document’s phrases to surface the most salient ones. It also supports low-cost extractive summarization and concept extraction from unstructured text. You add it to a spaCy nlp pipeline with nlp.add_pipe("textrank") and read ranked phrases from doc._.phrases.
The interesting bit Instead of treating TextRank as a standalone script, the library wraps the whole family—TextRank, PositionRank, Biased TextRank, and TopicRank—into a single spaCy component. That means you swap algorithms without rewiring your NLP pipeline. The major version tracks spaCy’s, which is either admirably disciplined or a quiet admission of tight coupling.
Key highlights
- Four textgraph algorithms in one spaCy pipe: TextRank, PositionRank, Biased TextRank, TopicRank
- Phrase extraction with rank scores, occurrence counts, and chunk lists per phrase
- Extractive summarization without training a separate model
- MIT licensed; actively maintained since 2016 with DOI-backed citations
- Tutorial notebooks and conda/PyPI installs available
Caveats
- Requires a spaCy model download (
en_core_web_smor equivalent) before first use - The README’s quickstart example uses fairly dense academic text; real-world performance on short or informal text is not demonstrated
- “PyTextTank” typo in the README suggests occasional doc drift
Verdict Worth a look if you need explainable, lightweight phrase extraction or summarization inside an existing spaCy workflow. Skip it if you’re after abstractive summarization or neural re-ranking—this is classical graph methods, not transformers.