Undergrad's LSTM QA project: honest, deprecated, oddly refreshing
A Chinese question-answering system whose author tells you not to use it.

What it does
This repo implements a sentence-level answer retrieval system for Chinese text: given a question and multiple candidate sentences, a bidirectional LSTM identifies which sentence contains the answer. It uses jieba for segmentation and pre-trained 50-dimensional word embeddings from Chinese Wikipedia. The author reports MRR above 0.75 on a held-out dev set.
The interesting bit
The README opens with “该项目已停止维护!!!” and calls the code “基本全是瞎写的” — a level of self-awareness rare in academic GitHub repos. The author admits the model was chosen for convenience, the API usage was clumsy, and the hyperparameter tuning was “很粗糙.” This transparency is more useful than most polished-but-unreproducible NLP papers.
Key highlights
- BiLSTM architecture for sentence ranking in Chinese QA
- Uses jieba + 50-dim Wikipedia-trained word embeddings
- Evaluation via MRR, MAP, and ACC@1 (script credited to a teaching assistant)
- TensorFlow 1.2.1, Python 3.5.2 — firmly archaeological stack
- Training: ~8GB RAM, 2GB VRAM, 12 hours on a GTX 850M
- Results vary ±0.03 MRR across runs with identical parameters; cause unknown
Caveats
- Explicitly abandoned by the author with no maintenance planned
- Dataset cannot be shared due to licensing; you’ll need your own training.data and develop.data
- “代码层面还是学术层面都没有太大参考价值” — the author’s own assessment
- Hardware requirements and TF 1.x dependencies make reproduction a deliberate exercise in retrocomputing
Verdict
Worth a skim if you’re studying how not to structure a deep learning project, or if you need a baseline biLSTM implementation you can freely criticize. Anyone seeking a production Chinese QA system should look elsewhere — the author would agree.