Yes — INESCTEC/yake is an open-source project tracked on heatdrop.

What language is yake written in?

INESCTEC/yake is primarily written in Jupyter Notebook.

INESCTEC/yake has 1.9k stars on GitHub.

Where can I find yake?

INESCTEC/yake is on GitHub at https://github.com/INESCTEC/yake.

← all repositories

INESCTEC/yake

Keyword extraction without the training-data treadmill

A statistical keyword extractor that works on single documents with zero training or external corpora.

★1.9k stars Jupyter Notebook Data Tooling RAG · Search

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does YAKE! pulls keywords from a single document using only statistical features—word frequency, position, casing, and co-occurrence patterns. No neural models, no pre-trained embeddings, no labeled datasets. You feed it text; it returns scored n-grams. Lower scores mean higher relevance.

The interesting bit The method is deliberately collection-independent: it derives everything from the document itself. This makes it portable across languages and domains without retraining, a rarity in an era where most NLP tools ship with gigabyte-sized model weights. It won Best Short Paper at ECIR 2018, suggesting the academics found the simplicity defensible.

Key highlights

Unsupervised: no training data, no external dictionaries, no corpus statistics
Single-document focus: each document is self-contained; no batching required
Multilingual: supports multiple languages via language parameter (Portuguese shown in docs)
Configurable deduplication: Levenshtein, Jaro, or sequence matcher to suppress near-duplicate phrases
Optional lemmatization (v0.6.0+) to collapse morphological variants like “tree/trees”
Includes a TextHighlighter utility for marking keywords in HTML output

Caveats

The README doesn’t quantify accuracy or compare against modern embedding-based methods (e.g., KeyBERT); effectiveness on long or highly technical documents is unclear
“Language and domain independent” is claimed but not benchmarked across domains in the provided docs
Command-line help contains a typo (“deduplication limiar” instead of “limit”)

Verdict Useful for quick prototyping, low-resource environments, or when you can’t ship a transformer model. Skip it if you need state-of-the-art precision and have the GPU cycles for supervised alternatives.

Frequently asked

What is INESCTEC/yake?: A statistical keyword extractor that works on single documents with zero training or external corpora.
Is yake open source?: Yes — INESCTEC/yake is an open-source project tracked on heatdrop.
What language is yake written in?: INESCTEC/yake is primarily written in Jupyter Notebook.
How popular is yake?: INESCTEC/yake has 1.9k stars on GitHub.
Where can I find yake?: INESCTEC/yake is on GitHub at https://github.com/INESCTEC/yake.