Teaching spaCy to figure out who "she" actually is
A spaCy pipeline extension that resolves pronouns and references using a neural network, so your NLP pipeline knows that "she" means "my sister".

What it does
NeuralCoref plugs into spaCy 2.1+ and resolves coreference clusters — those pesky pronouns and repeated noun phrases that refer back to earlier mentions. Feed it “My sister has a dog. She loves him” and it annotates that “She” links to “My sister” and “him” to “a dog”. It exposes results through spaCy’s standard ._. extension attributes on Doc, Span, and Token objects, so it feels native to anyone already in the spaCy ecosystem.
The interesting bit
The project splits the problem into two stages: a rule-based mention detector that leans on spaCy’s existing tagger, parser, and NER, followed by a feed-forward neural net that scores candidate pairs. This hybrid approach means the quality of coreference resolution is tightly coupled to whichever spaCy English model you’ve installed — larger models, better mentions, better coref.
Key highlights
- Pre-trained English model only; extensible to new training datasets per the README
- Written in Python/Cython with
pip install neuralcorefavailability - Downloads ~40MB model weights on first import to
~/.neuralcoref_cache(override viaNEURALCOREF_CACHEenv var) - Tunable
greedynessparameter (0–1) andmax_distfor antecedent lookback - Companion web viz tool NeuralCoref-Viz with live demo
Caveats
- English-only out of the box; no multilingual support mentioned
- Binary incompatibility with spaCy versions can trigger
StringStore size changederrors, requiring--no-binaryreinstall from source - Performance depends heavily on spaCy’s tagger/parser/NER quality, so skimping on the base model hurts downstream coref
Verdict
Worth a look if you’re already building on spaCy and need coreference resolution without leaving the pipeline. Skip it if you need multilingual support or aren’t prepared to manage the spaCy model dependency chain.