← all repositories
ICLRandD/Blackstone

A spaCy model that actually reads the fine print

Open-source NLP trained on 150 years of English case law, warts and all.

689 stars Python Domain AppsData Tooling
Blackstone
Velocity · 7d
+0.3
★ / day
Trend
steady
star history

What it does

Blackstone is a spaCy pipeline and model for extracting structure from long-form legal text—specifically English and Welsh common law. It spots case names, citations, legislation, court references, and judges, plus classifies sentences into categories like “legal test” or “conclusion.” Built by ICLR&D, the research arm of the Incorporated Council of Law Reporting.

The interesting bit

The training data stretches back to the 1860s, which matters because common law never forgets—a Victorian judgment can still bind today. The authors openly admit the NER F1 sits around 70% and call it a “prototype,” which is refreshing honesty in a field that usually pretends its models are courtroom-ready.

Key highlights

  • Custom NER for six legal entity types: CASENAME, CITATION, INSTRUMENT, PROVISION, COURT, JUDGE
  • Text categoriser labels sentences as AXIOM, CONCLUSION, ISSUE, LEGAL_TEST, or UNCAT
  • Extra pipeline components: abbreviation resolution, compound case reference detection, legislation linker, sentence segmenter
  • spaCy-native, so it drops into existing Python workflows
  • Generalises “reasonably well” to Australasian, Canadian, and American content

Caveats

  • Training data is proprietary and unreleased—you can’t inspect or extend it
  • Tokenizer, tagger, and parser are borrowed from spaCy’s en_core_web_sm; not retrained for legal syntax
  • Explicitly “not a judge or litigation analytics tool”—don’t expect precedent prediction

Verdict

Worth a look if you’re building legal research tools, citation managers, or just need to parse judgment PDFs without writing another regex. Skip it if you need production-grade accuracy today or work outside common-law jurisdictions.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.