81K stars for a repo that is mostly links to other repos
A curated Chinese-language directory of NLP tools, datasets, and models—part bookmark dump, part field guide.

What it does
funNLP is a massive, manually curated index of Chinese and multilingual NLP resources. The README sorts hundreds of projects into tables by task—ChatGPT clones, corpus collections, named-entity recognition, knowledge graphs, speech recognition, even a “Wang Feng lyrics generator.” Each entry gets a one-line description and an external link. Think of it as a well-organized del.icio.us for Chinese NLP practitioners.
The interesting bit
The curation is stubbornly practical and culturally specific. You will find phone-number regexes for Chinese carriers, a “crimes and legal terms classification model,” and tools for converting Arabic numerals to Chinese characters. The author clearly assembled this while solving real problems, not while writing a literature review.
Key highlights
- Covers the full pipeline: tokenization, pre-trained models (BERT, ERNIE, GPT-2), text generation, summarization, OCR, ASR, and knowledge-graph construction
- Heavy emphasis on Chinese-language resources, including domain-specific corpora for finance, law, medicine, and military applications
- Recently expanded to track LLM evaluation benchmarks (C-Eval, OpenCompass) and “ChatGPT-like” frameworks
- Includes oddities: a laughter detector, a couplet-generating CNN, a tool that removes text from manga panels for translation
- 81K GitHub stars suggest it fills a real discovery gap for Chinese-speaking developers
Caveats
- This is a link list, not a framework; there is no installable package or unified API
- Descriptions vary in depth—some are detailed, others are a single sentence copied from the upstream repo
- “Long-term irregular updates” means freshness is not guaranteed
Verdict
Useful if you are starting a Chinese NLP project and need to know what already exists. Skip it if you want a single dependency to pip install; this is a map, not a vehicle.