A Chinese legal AI that predicts crimes from case descriptions
Trained on 2.88 million court records, it classifies charges, sorts legal questions, and answers them—sometimes with alarming confidence.

What it does CrimeKgAssitant is a Chinese legal NLP toolkit with three main jobs: predict which of 202 possible crimes matches a case description, classify legal questions into 13 categories (marriage, labor, traffic, etc.), and answer those questions by retrieving similar past responses. It also includes an 856-concept knowledge graph of criminal charges.
The interesting bit The project achieves ~92% accuracy on charge prediction using nothing fancier than doc embeddings plus SVM—no neural nets, just 2.88 million training examples and 12 hours of training. The QA system, meanwhile, is endearingly blunt: ask about selling contraband and it replies “没什么” (“nothing”); ask about finding a girlfriend and it routes you to the police.
Key highlights
- 2.88M case records for 202-class charge prediction (SVM, 92% accuracy)
- 200K legal QA pairs for 13-category question classification (CNN hits 95.9% test accuracy; LSTM lags at 71.7%)
- 856-concept crime knowledge graph for structured queries
- Retrieval-based QA that returns actual past answers, not generated text
- All training data and dictionaries included in the repo
Caveats
- The README is entirely in Chinese; code comments and variable names follow suit
- QA quality varies wildly—some answers are detailed legal procedures, others are comically terse or off-topic
- No model weights or pre-trained embeddings are provided; you train from scratch
- The “knowledge graph” appears to be a concept list rather than a queryable graph structure in the released code
Verdict Worth a look if you’re building Chinese legal NLP or need a baseline for charge classification. Skip it if you need production-ready legal advice or English-language support—the “assistant” part is aspirational.