NLP that knows LLC. isn't the end of a sentence
A legal-text extraction library built by people who have apparently read enough contracts to know that "F.3d" is not three separate sentences.

What it does LexNLP parses unstructured legal documents—contracts, policies, procedures—and extracts structured facts: monetary amounts, dates, durations, court citations, conditional constraints like “less than” or “later than.” It also provides pre-trained word embeddings, topic models, and classifiers for document and clause types. The library is part of a larger ContraxSuite ecosystem, but functions standalone.
The interesting bit The sentence parser is explicitly aware of legal abbreviations (LLC., F.3d) that trip up general-purpose NLP tools. It also ships with pre-trained segmentation models for legal-specific concepts like pages and sections, plus hundreds of unit tests drawn from real documents—not synthetic legal-ish text.
Key highlights
- Sentence segmentation that handles legal abbreviations without splitting them into false sentence boundaries
- Extraction of monetary amounts, percentages, ratios, dates, recurring dates, and durations
- Pre-trained classifiers for document type and clause type
- Pre-trained word embedding and topic models, including practice-area-specific variants
- Tools for building custom clustering and classification methods on top of extracted features
- Dual-licensed: AGPLv3 by default, with commercial licensing available on request
Caveats
- Documentation is marked “in progress” and the ReadTheDocs link points to an older version (docs-0.1.6)
- Requires Python 3.8 and pipenv; the Travis CI and Coveralls badges suggest maintenance status is unclear
- ContraxSuite installations generally require additional trained models or “knowledge sets” from a separate repository
Verdict Worth a look if you’re building legal-document pipelines and general-purpose NLP keeps mangling your citations. Skip if you need polished docs, modern Python support, or a permissive license without emailing a sales address.