Chinese legal NLP: from traffic court text to verdict prediction
A complete pipeline that turns unstructured traffic-accident judgments into structured events, then guesses the sentence.

What it does
This repo processes Chinese traffic-accident court judgments end-to-end: clean the text, segment words with LTP, tag parts of speech, extract named entities, pull out event elements with CRF++, and finally predict verdicts or find similar cases. The whole thing is organized as numbered processing steps rather than a proper Python package, which tells you something about its origins.
The interesting bit
The author has clearly been through the dependency wars. The README spends more space on how to avoid installing things than on the model architecture: pre-computed CRF outputs ship with the repo so macOS users don’t have to wrestle crf_test.exe, and there’s a --skip-train flag specifically to let you see results without touching PyTorch or LTP. It’s a research artifact that knows it’s a research artifact.
Key highlights
- CRF-based event-element extraction on top of LTP segmentation/NER
- Verdict prediction framed as both multi-class classification and regression tasks in PyTorch
- Case-similarity matching using extracted feature patterns
run_project.pyacts as a unified CLI with sensible flags (--step check,--skip-train,--no-cache)- Ships with cached intermediate results so the pipeline runs without
pyltpor CRF++ binaries
Caveats
pyltpis deprecated and increasingly hard to install on modern macOS; the README admits this openly- The bundled
crf_test.exeis Windows-only, so full retraining on Unix requires manual CRF++ setup - Directory names are in Chinese, which may complicate cross-platform path handling
Verdict
Worth a look if you’re doing Chinese legal NLP or need a concrete CRF+PyTorch baseline for event extraction. Skip it if you want a maintained, pip-installable library — this is a snapshot of a graduate research project, not a product.