TL;DR as a service, circa 2013
A Python library that extracts summaries from HTML or plain text using half a dozen classical NLP algorithms, because sometimes you just need the gist.

What it does
Sumy is a command-line tool and Python library for automatic text summarization. Feed it a URL or a text file, pick an algorithm (LexRank, LSA, Luhn, Edmundson, and others), and specify how many sentences or what percentage of the original you want back. It also includes a basic evaluation framework for comparing summarizers against reference summaries.
The interesting bit
The project is essentially a curated museum of pre-transformer summarization techniques. Where modern solutions throw GPU clusters at the problem, Sumy implements classical algorithms like LexRank (PageRank for sentences) and latent semantic analysis. The author even maintains a list of alternative implementations in other languages, suggesting this was partly an educational exercise in surveying the field.
Key highlights
- Supports multiple languages with tokenizers; the README claims adding new ones is “not too hard”
- Ships as both CLI (
sumy,sumy_eval) and importable library - Available via Docker image for quick testing without installation
- Someone ported it to a Hugging Face space for browser-based use
- Requires Python 3.8+; installation docs prominently feature
uvas the recommended path
Caveats
- The Python API example still uses
from __future__ importstatements, suggesting the codebase carries some legacy weight - No performance benchmarks or accuracy comparisons against modern methods are provided in the README
- The evaluation framework is described as “simple” by the author themselves
Verdict
Worth a look if you need extractive summarization without pulling in PyTorch or calling an API, or if you’re teaching classical NLP. Skip it if you need abstractive summaries or state-of-the-art quality; this is a swiss army knife from the pre-BERT era.