A spaCy model that actually reads the fine print
Open-source NLP trained on 150 years of English case law, warts and all.

What it does
Blackstone is a spaCy pipeline and model for extracting structure from long-form legal text—specifically English and Welsh common law. It spots case names, citations, legislation, court references, and judges, plus classifies sentences into categories like “legal test” or “conclusion.” Built by ICLR&D, the research arm of the Incorporated Council of Law Reporting.
The interesting bit
The training data stretches back to the 1860s, which matters because common law never forgets—a Victorian judgment can still bind today. The authors openly admit the NER F1 sits around 70% and call it a “prototype,” which is refreshing honesty in a field that usually pretends its models are courtroom-ready.
Key highlights
- Custom NER for six legal entity types:
CASENAME,CITATION,INSTRUMENT,PROVISION,COURT,JUDGE - Text categoriser labels sentences as
AXIOM,CONCLUSION,ISSUE,LEGAL_TEST, orUNCAT - Extra pipeline components: abbreviation resolution, compound case reference detection, legislation linker, sentence segmenter
- spaCy-native, so it drops into existing Python workflows
- Generalises “reasonably well” to Australasian, Canadian, and American content
Caveats
- Training data is proprietary and unreleased—you can’t inspect or extend it
- Tokenizer, tagger, and parser are borrowed from spaCy’s
en_core_web_sm; not retrained for legal syntax - Explicitly “not a judge or litigation analytics tool”—don’t expect precedent prediction
Verdict
Worth a look if you’re building legal research tools, citation managers, or just need to parse judgment PDFs without writing another regex. Skip it if you need production-grade accuracy today or work outside common-law jurisdictions.