← all repositories
huggingface/neuralcoref

Teaching spaCy to figure out who "she" actually is

A spaCy pipeline extension that resolves pronouns and references using a neural network, so your NLP pipeline knows that "she" means "my sister".

2.9k stars C Other AI
neuralcoref
Velocity · 7d
+0.9
★ / day
Trend
steady
star history

What it does

NeuralCoref plugs into spaCy 2.1+ and resolves coreference clusters — those pesky pronouns and repeated noun phrases that refer back to earlier mentions. Feed it “My sister has a dog. She loves him” and it annotates that “She” links to “My sister” and “him” to “a dog”. It exposes results through spaCy’s standard ._. extension attributes on Doc, Span, and Token objects, so it feels native to anyone already in the spaCy ecosystem.

The interesting bit

The project splits the problem into two stages: a rule-based mention detector that leans on spaCy’s existing tagger, parser, and NER, followed by a feed-forward neural net that scores candidate pairs. This hybrid approach means the quality of coreference resolution is tightly coupled to whichever spaCy English model you’ve installed — larger models, better mentions, better coref.

Key highlights

  • Pre-trained English model only; extensible to new training datasets per the README
  • Written in Python/Cython with pip install neuralcoref availability
  • Downloads ~40MB model weights on first import to ~/.neuralcoref_cache (override via NEURALCOREF_CACHE env var)
  • Tunable greedyness parameter (0–1) and max_dist for antecedent lookback
  • Companion web viz tool NeuralCoref-Viz with live demo

Caveats

  • English-only out of the box; no multilingual support mentioned
  • Binary incompatibility with spaCy versions can trigger StringStore size changed errors, requiring --no-binary reinstall from source
  • Performance depends heavily on spaCy’s tagger/parser/NER quality, so skimping on the base model hurts downstream coref

Verdict

Worth a look if you’re already building on spaCy and need coreference resolution without leaving the pipeline. Skip it if you need multilingual support or aren’t prepared to manage the spaCy model dependency chain.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.