Is chatterbot-corpus open source?

Yes — gunthercox/chatterbot-corpus is open source, released under the BSD-3-Clause license.

What language is chatterbot-corpus written in?

gunthercox/chatterbot-corpus is primarily written in Python.

How popular is chatterbot-corpus?

gunthercox/chatterbot-corpus has 1.4k stars on GitHub.

Where can I find chatterbot-corpus?

gunthercox/chatterbot-corpus is on GitHub at https://github.com/gunthercox/chatterbot-corpus.

← all repositories

gunthercox/chatterbot-corpus

Chatbot training data: crowdsourced, YAML-flavored, occasionally wrong

A community-contributed multilingual corpus for bootstrapping ChatterBot when you have nothing else to say.

★1.4k stars Python Data Tooling Chat Assistants

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does

ChatterBot Corpus is a collection of user-contributed conversation datasets in YAML format, designed to prime fresh ChatterBot installations with basic dialog across multiple languages. You drop these files into chatterbot_corpus/data/, point your bot at them, and get something that can respond to “Hello” without embarrassing silence.

The interesting bit

The project treats training data as plain-text infrastructure — no databases, no proprietary formats, just categorized YAML files anyone can edit. The README includes a slightly overwrought Daniel Read quote about unit testing, which feels like a quiet apology for the fact that community-contributed content may contain “occasional mistakes or inaccuracies.”

Key highlights

Multilingual coverage, though specific language counts aren’t listed
Simple YAML schema: categories header plus paired conversation lines
Extensible: add new languages by creating directories and pull requests
Distributed via PyPI as a companion package to ChatterBot
Includes basic unittest suite (python -Wonce -m unittest discover)

Caveats

Content quality varies; the maintainers explicitly warn of potential errors in user submissions
Documentation link (http://corpus.chatterbot.us/) is referenced but not described in detail
No visible versioning or quality metrics for individual language datasets

Verdict

Useful if you’re building with ChatterBot and need starter data faster than you can write it. Skip it if you need guaranteed-accurate, professionally curated dialog — or if you’ve already moved on to retrieval-augmented generation and wonder why you’re reading about YAML chatbot training in 2024.

Frequently asked

What is gunthercox/chatterbot-corpus?: A community-contributed multilingual corpus for bootstrapping ChatterBot when you have nothing else to say.
Is chatterbot-corpus open source?: Yes — gunthercox/chatterbot-corpus is open source, released under the BSD-3-Clause license.
What language is chatterbot-corpus written in?: gunthercox/chatterbot-corpus is primarily written in Python.
How popular is chatterbot-corpus?: gunthercox/chatterbot-corpus has 1.4k stars on GitHub.
Where can I find chatterbot-corpus?: gunthercox/chatterbot-corpus is on GitHub at https://github.com/gunthercox/chatterbot-corpus.