A 1,731-star NLP repo that teaches Python 2.7 setup
Chinese NLP experiments on the Sougou dataset, wrapped in a README that walks you through installing pip from a zip file.

What it does
TextInfoExp is a collection of classical NLP experiments—TF-IDF, text classification, clustering, word vectors, sentiment analysis, and relation extraction—run against the Sougou Chinese news corpus. The code is in Python, or at least was, back when 2.7 was the recommended version.
The interesting bit
The README is essentially a time capsule: it instructs you to download pip 9.0.1 as a tarball, install from setup.py, and configure PyCharm’s VCS menu to clone from GitHub. For a repo with nearly two thousand stars, the documentation prioritizes IDE setup over explaining what any of the algorithms actually do.
Key highlights
- Covers bread-and-butter NLP tasks: TF-IDF, classification, clustering, word embeddings, sentiment, relation extraction
- Targets the Sougou Chinese text dataset (a now-venerable news corpus)
- Uses
jiebafor Chinese segmentation - README includes pip troubleshooting with Alibaba’s PyPI mirror for Chinese users
- Python 2.7 specified throughout; no mention of 3.x compatibility
Caveats
- README is entirely environment-setup instructions; zero detail on model architecture, results, or how to run experiments
- Python 2.7 reached end-of-life in January 2020; dependency versions unspecified
- No candidate images provided, and no screenshots or example outputs visible in the source
Verdict
Worth a look if you’re specifically hunting for classical Chinese NLP implementations and don’t mind archaeology. Skip it if you need runnable, documented code or modern Python.