← all repositories
LexPredict/lexpredict-lexnlp

NLP that knows LLC. isn't the end of a sentence

A legal-text extraction library built by people who have apparently read enough contracts to know that "F.3d" is not three separate sentences.

782 stars Jupyter Notebook Domain AppsML FrameworksLanguage Models
lexpredict-lexnlp
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

What it does LexNLP parses unstructured legal documents—contracts, policies, procedures—and extracts structured facts: monetary amounts, dates, durations, court citations, conditional constraints like “less than” or “later than.” It also provides pre-trained word embeddings, topic models, and classifiers for document and clause types. The library is part of a larger ContraxSuite ecosystem, but functions standalone.

The interesting bit The sentence parser is explicitly aware of legal abbreviations (LLC., F.3d) that trip up general-purpose NLP tools. It also ships with pre-trained segmentation models for legal-specific concepts like pages and sections, plus hundreds of unit tests drawn from real documents—not synthetic legal-ish text.

Key highlights

  • Sentence segmentation that handles legal abbreviations without splitting them into false sentence boundaries
  • Extraction of monetary amounts, percentages, ratios, dates, recurring dates, and durations
  • Pre-trained classifiers for document type and clause type
  • Pre-trained word embedding and topic models, including practice-area-specific variants
  • Tools for building custom clustering and classification methods on top of extracted features
  • Dual-licensed: AGPLv3 by default, with commercial licensing available on request

Caveats

  • Documentation is marked “in progress” and the ReadTheDocs link points to an older version (docs-0.1.6)
  • Requires Python 3.8 and pipenv; the Travis CI and Coveralls badges suggest maintenance status is unclear
  • ContraxSuite installations generally require additional trained models or “knowledge sets” from a separate repository

Verdict Worth a look if you’re building legal-document pipelines and general-purpose NLP keeps mangling your citations. Skip if you need polished docs, modern Python support, or a permissive license without emailing a sales address.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.