erre-quadro/spikex
SpikeX is a collection of spaCy pipeline components for knowledge extraction tasks like entity linking, abbreviation detection, phrase extraction, and text clustering.

SpikeX extends the spaCy NLP library with ready-to-use pipeline components for structured knowledge extraction. It provides components for linking Wikipedia pages to text chunks, clustering noun phrases using a radial Ball Mapper algorithm, detecting and resolving abbreviations and acronyms, extracting noun and verb phrases, and pattern-based labeling with overlap resolution. The library includes a WikiGraph module that uses sparse adjacency matrices for efficient Wikipedia graph traversal and bidirectional dictionaries to optimize memory usage.