← all repositories
DerwenAI/pytextrank

PageRank for your prose: a spaCy pipeline that ranks phrases by graph

Implements TextRank and three variants as a drop-in spaCy component for extractive NLP tasks.

2.2k stars Python RAG · SearchData Tooling
pytextrank
Velocity · 7d
+0.6
★ / day
Trend
steady
star history

What it does PyTextRank is a spaCy pipeline extension that runs graph-based ranking algorithms over a document’s phrases to surface the most salient ones. It also supports low-cost extractive summarization and concept extraction from unstructured text. You add it to a spaCy nlp pipeline with nlp.add_pipe("textrank") and read ranked phrases from doc._.phrases.

The interesting bit Instead of treating TextRank as a standalone script, the library wraps the whole family—TextRank, PositionRank, Biased TextRank, and TopicRank—into a single spaCy component. That means you swap algorithms without rewiring your NLP pipeline. The major version tracks spaCy’s, which is either admirably disciplined or a quiet admission of tight coupling.

Key highlights

  • Four textgraph algorithms in one spaCy pipe: TextRank, PositionRank, Biased TextRank, TopicRank
  • Phrase extraction with rank scores, occurrence counts, and chunk lists per phrase
  • Extractive summarization without training a separate model
  • MIT licensed; actively maintained since 2016 with DOI-backed citations
  • Tutorial notebooks and conda/PyPI installs available

Caveats

  • Requires a spaCy model download (en_core_web_sm or equivalent) before first use
  • The README’s quickstart example uses fairly dense academic text; real-world performance on short or informal text is not demonstrated
  • “PyTextTank” typo in the README suggests occasional doc drift

Verdict Worth a look if you need explainable, lightweight phrase extraction or summarization inside an existing spaCy workflow. Skip it if you’re after abstractive summarization or neural re-ranking—this is classical graph methods, not transformers.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.