← all repositories
boudinfl/pke

A Swiss Army knife for pulling keyphrases out of text

pke bundles a dozen extraction algorithms behind one tidy Python API so you can swap heuristics without rewriting your pipeline.

1.6k stars Python Data ToolingOther AI
pke
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

What it does

pke is a Python toolkit for extracting keyphrases from documents. It wraps a range of statistical and graph-based algorithms—TF-IDF, TextRank, YAKE, TopicRank, and others—behind a uniform interface. You load a document, pick a model, run candidate selection and weighting, then ask for the top N phrases. It uses spaCy under the hood for preprocessing and ships with supervised models trained on SemEval-2010 data.

The interesting bit

The pipeline is deliberately modular: candidate selection, weighting, and ranking are separate stages you can override or extend. That design makes it practical to prototype a new extraction idea without rebuilding the boring plumbing around tokenization and part-of-speech filtering.

Key highlights

  • Implements 10+ models spanning unsupervised (statistical and graph-based) and supervised approaches
  • Standardized API: swap TopicRank for YAKE or MultipartiteRank in one line
  • Built-in benchmarking against common datasets with reproducible notebooks
  • Tutorials available as Colab notebooks for hands-on experimentation
  • Published and maintained by an academic author with a COLING 2016 paper

Caveats

  • Requires spaCy >= 3.2.3 and manual model downloads; not a pure-Pip install-and-go experience
  • Only one supervised model (Kea) is listed, so the “supervised” shelf is thin compared to the unsupervised collection

Verdict

Researchers and practitioners who need to compare multiple keyphrase extraction methods quickly should grab this. If you already know you want a single specific algorithm and don’t care about benchmarking alternatives, you might just use that library directly.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.