← all repositories
miso-belica/sumy

TL;DR as a service, circa 2013

A Python library that extracts summaries from HTML or plain text using half a dozen classical NLP algorithms, because sometimes you just need the gist.

3.7k stars Python Other AI
sumy
Velocity · 7d
+0.8
★ / day
Trend
steady
star history

What it does

Sumy is a command-line tool and Python library for automatic text summarization. Feed it a URL or a text file, pick an algorithm (LexRank, LSA, Luhn, Edmundson, and others), and specify how many sentences or what percentage of the original you want back. It also includes a basic evaluation framework for comparing summarizers against reference summaries.

The interesting bit

The project is essentially a curated museum of pre-transformer summarization techniques. Where modern solutions throw GPU clusters at the problem, Sumy implements classical algorithms like LexRank (PageRank for sentences) and latent semantic analysis. The author even maintains a list of alternative implementations in other languages, suggesting this was partly an educational exercise in surveying the field.

Key highlights

  • Supports multiple languages with tokenizers; the README claims adding new ones is “not too hard”
  • Ships as both CLI (sumy, sumy_eval) and importable library
  • Available via Docker image for quick testing without installation
  • Someone ported it to a Hugging Face space for browser-based use
  • Requires Python 3.8+; installation docs prominently feature uv as the recommended path

Caveats

  • The Python API example still uses from __future__ import statements, suggesting the codebase carries some legacy weight
  • No performance benchmarks or accuracy comparisons against modern methods are provided in the README
  • The evaluation framework is described as “simple” by the author themselves

Verdict

Worth a look if you need extractive summarization without pulling in PyTorch or calling an API, or if you’re teaching classical NLP. Skip it if you need abstractive summaries or state-of-the-art quality; this is a swiss army knife from the pre-BERT era.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.