PageRank for sentences: a classic NLP workhorse in Python
A straightforward implementation of TextRank for summarization and keyword extraction, with a tweak to the similarity function borrowed from a 2015 paper.

What it does
summa applies the TextRank graph algorithm—essentially PageRank on sentences—to extract summaries and keywords from text. Feed it a blob of text, get back a shorter blob or a list of salient terms. It supports 17 languages and lets you cap output by ratio or word count.
The interesting bit
The project isn’t just a naive reimplementation; it incorporates a modified similarity function from a 2015 Argentine paper that tweaks how sentence relatedness is calculated. Whether that actually moves the needle for your use case is left as an exercise to the reader—the README doesn’t provide benchmarks.
Key highlights
- Summarization and keyword extraction via
summa.summarizerandsumma.keywords - Output control:
ratio=0.2,words=50, orsplit=Truefor list output - 17 supported languages including Arabic, Russian, and the usual European suspects
- Command-line interface:
textrank -t FILE - Optional
Patterndependency for better keyword extraction performance
Caveats
- Line breaks are treated as sentence separators, so preprocessing matters; the README warns about this but doesn’t suggest how to handle messy input
- No performance numbers, model sizes, or comparison to modern neural summarizers—useful for quick baselines, unclear if it competes with contemporary methods
Verdict
Good for developers who need a fast, dependency-light summarizer without reaching for transformers. Skip it if you’re already invested in modern neural pipelines and need state-of-the-art coherence.