← all repositories
rodrigopivi/Chatito

A DSL for faking conversations with your future chatbot

Chatito generates training datasets for NLP models from a compact domain-specific language, complete with an online IDE and adapters for Rasa, LUIS, Flair, and Snips.

888 stars TypeScript Data ToolingChat Assistants
Chatito
Velocity · 7d
+0.3
★ / day
Trend
steady
star history

What it does

Chatito is a text generator dressed up as a chatbot tool. You write sentence templates in a small DSL—defining intents, slots, aliases, and optional fragments—and it expands those templates into training/testing datasets for NLU models. Think of it as mad-libs for machine learning: you describe the shape of possible user utterances, and Chatito fills in the combinations.

The project ships as both an online IDE (handy for tinkering) and an npm CLI package for batch generation. Output formats cover Rasa, Microsoft LUIS, Flair (FastText + BIO tagging), and Snips NLU, plus a catch-all default format for custom pipelines.

The interesting bit

The DSL lets you annotate slots with custom key-value arguments—('entity': 'snips/datetime'), ('synonym': 'true'), whatever you need—and those annotations propagate through to the generated output. This means the same template file can drive training data, entity typing, and even downstream dialog logic without forking your source of truth. The Rasa adapter’s synonym mapping and the Snips entity-type passthrough are concrete examples of this flexibility in action.

Key highlights

  • PegJS parser + TypeScript generator: The DSL is formally specified with a pegjs grammar and the generator is implemented in TypeScript.
  • Online IDE: Browser-based editor with syntax highlighting and live generation; no install required.
  • Adapter ecosystem: Native support for Rasa, LUIS, Flair, Snips NLU, plus a default format for custom adapters.
  • Distribution control: Per-entity frequency distributions (regular or even) to bias or flatten sample generation.
  • VS Code extension: Third-party syntax highlighting available via marketplace.

Caveats

  • The Flair adapter is CLI-only; it won’t work in the online IDE.
  • Tokenization for Flair NER uses a “simple tokenizer”—the README doesn’t specify which algorithm, so token boundary behavior may need verification for your corpus.
  • Samples are not shuffled across intents by design; the tool expects you to split intents into separate files for review and maintenance.

Verdict

Worth a look if you’re maintaining chatbot training data by hand and drowning in copy-paste. Skip it if you already have a data pipeline that handles augmentation and format conversion, or if you need advanced linguistic variation beyond template expansion.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.