A Swiss Army knife for pulling keyphrases out of text
pke bundles a dozen extraction algorithms behind one tidy Python API so you can swap heuristics without rewriting your pipeline.

What it does
pke is a Python toolkit for extracting keyphrases from documents. It wraps a range of statistical and graph-based algorithms—TF-IDF, TextRank, YAKE, TopicRank, and others—behind a uniform interface. You load a document, pick a model, run candidate selection and weighting, then ask for the top N phrases. It uses spaCy under the hood for preprocessing and ships with supervised models trained on SemEval-2010 data.
The interesting bit
The pipeline is deliberately modular: candidate selection, weighting, and ranking are separate stages you can override or extend. That design makes it practical to prototype a new extraction idea without rebuilding the boring plumbing around tokenization and part-of-speech filtering.
Key highlights
- Implements 10+ models spanning unsupervised (statistical and graph-based) and supervised approaches
- Standardized API: swap
TopicRankforYAKEorMultipartiteRankin one line - Built-in benchmarking against common datasets with reproducible notebooks
- Tutorials available as Colab notebooks for hands-on experimentation
- Published and maintained by an academic author with a COLING 2016 paper
Caveats
- Requires spaCy >= 3.2.3 and manual model downloads; not a pure-Pip install-and-go experience
- Only one supervised model (Kea) is listed, so the “supervised” shelf is thin compared to the unsupervised collection
Verdict
Researchers and practitioners who need to compare multiple keyphrase extraction methods quickly should grab this. If you already know you want a single specific algorithm and don’t care about benchmarking alternatives, you might just use that library directly.