A chatbot field guide from the seq2seq era
A curated index of chatbot repos, corpora, and papers, with a notable Chinese-language section.

What it does This repository is a hand-maintained directory of chatbot resources circa 2016–2017. It links out to roughly a dozen open-source chatbot implementations, five dialogue datasets, two foundational papers, and a handful of tutorials. A separate section highlights Chinese-language chatbots and corpora, which were harder to find in Western-centric lists at the time.
The interesting bit The value is in the curation, not the code. The repo acts as a time capsule of the seq2seq boom: TensorFlow and Torch implementations dominate, attention mechanisms are still novel enough to mention explicitly, and “trained on Reddit data” is listed as a feature. The Chinese section includes insurance Q&A corpora and vector-matching bots that don’t appear in most English-only roundups.
Key highlights
- Links to ParlAI, ChatterBot, DeepQA, and other frameworks still referenced today
- Chinese-specific resources: Seq2Seq_Chatbot_QA, dgk_lost_conv corpus, insuranceqa-corpus-zh
- Dialogue datasets: Cornell Movie-Dialogs, OpenSubtitles, Dialog_Corpus
- Foundational papers: Sutskever et al.’s seq2seq and Google’s “Neural Conversational Model”
- Tutorials from WildML and Google Research Blog on retrieval-based and generative approaches
Caveats
- Last updated roughly 2017; many links may rot or point to archived projects
- No code or original analysis in the repo itself — purely a link list
- “More” section redirects to a generic TensorFlow news site of unclear provenance
Verdict Worth a quick scan if you’re researching chatbot history or need Chinese-language dialogue corpora. Skip it if you want runnable code or modern transformer-based implementations; this is pre-BERT archaeology.