← all repositories
toon-format/toon

JSON for LLMs that actually fits in the context window

A new serialization format that trades braces for whitespace and turns uniform arrays into schema-aware tables, cutting token counts by ~40% without losing the JSON data model.

24.5k stars TypeScript LLMOps · EvalData Tooling
toon
Velocity · 7d
+107
★ / day
Trend
steady
star history

What it does

TOON is a lossless encoding of the JSON data model optimized for LLM prompts. It keeps objects, arrays, and primitives intact but replaces JSON’s brace-heavy syntax with YAML-like indentation for nested structures and CSV-style tables for uniform arrays of objects. You use JSON in your code; you ship TOON to the model.

The interesting bit

The format embeds guardrails directly in the syntax: array headers declare exact lengths with [N] and field names with {fields}, giving models an explicit schema to validate against. The benchmarks are refreshingly honest — they show TOON losing to XML on Gemini and note that deeply nested data is often still better served by compact JSON.

Key highlights

  • ~40% fewer tokens than standard JSON in mixed-structure benchmarks across 4 models (2,759 vs 4,587 tokens)
  • 76.4% retrieval accuracy vs JSON’s 75.0% on 209 test questions — slightly better comprehension with significantly less verbosity
  • Deterministic round-trips: encode JSON → TOON → JSON without data loss
  • Multi-language ecosystem with spec-driven implementations (TypeScript, Python, Go, Rust, .NET)
  • Provisional text/toon media type and .toon file extension

Caveats

  • Deeply nested or non-uniform structures see diminishing returns; JSON-compact can be more token-efficient
  • Some local/quantized models (e.g., Ollama) may process compact JSON faster despite TOON’s lower token count — the README explicitly advises benchmarking TTFT and tokens/sec for your specific deployment
  • Format is stable but explicitly labeled “an idea in progress”; spec is open to revision

Verdict

Worth a look if you’re routinely stuffing large tabular datasets into prompts and watching your token budget evaporate. Skip it if your data is deeply nested, your pipeline is JSON-locked with no translation layer, or you’re running latency-sensitive local inference where token count doesn’t map cleanly to wall-clock time.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.