← all repositories
summanlp/textrank

PageRank for sentences: a classic NLP workhorse in Python

A straightforward implementation of TextRank for summarization and keyword extraction, with a tweak to the similarity function borrowed from a 2015 paper.

1.3k stars Python ML FrameworksData Tooling
textrank
Velocity · 7d
+0.3
★ / day
Trend
steady
star history

What it does

summa applies the TextRank graph algorithm—essentially PageRank on sentences—to extract summaries and keywords from text. Feed it a blob of text, get back a shorter blob or a list of salient terms. It supports 17 languages and lets you cap output by ratio or word count.

The interesting bit

The project isn’t just a naive reimplementation; it incorporates a modified similarity function from a 2015 Argentine paper that tweaks how sentence relatedness is calculated. Whether that actually moves the needle for your use case is left as an exercise to the reader—the README doesn’t provide benchmarks.

Key highlights

  • Summarization and keyword extraction via summa.summarizer and summa.keywords
  • Output control: ratio=0.2, words=50, or split=True for list output
  • 17 supported languages including Arabic, Russian, and the usual European suspects
  • Command-line interface: textrank -t FILE
  • Optional Pattern dependency for better keyword extraction performance

Caveats

  • Line breaks are treated as sentence separators, so preprocessing matters; the README warns about this but doesn’t suggest how to handle messy input
  • No performance numbers, model sizes, or comparison to modern neural summarizers—useful for quick baselines, unclear if it competes with contemporary methods

Verdict

Good for developers who need a fast, dependency-light summarizer without reaching for transformers. Skip it if you’re already invested in modern neural pipelines and need state-of-the-art coherence.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.