A workflow engine that admits research is messy
AI2 Tango caches experiment steps so you don't re-run what hasn't changed, without pretending your code is stable enough for production DAG tools.

What it does
Tango is a Python experiment runner from AI2 that breaks research code into decorated @step() functions, then caches their outputs keyed by a hash of inputs plus a manually-bumped VERSION string. You define steps in Python, wire them together in Jsonnet config files, and run via CLI. The second run of an unchanged step pulls from cache instead of re-executing.
The interesting bit
The design explicitly rejects source-code hashing as too brittle for research code that changes constantly. Instead, you manually increment a VERSION class variable when a step’s logic actually changes. It’s a deliberate trade-off: less magic, more transparency, and a tacit admission that your preprocessing code will be rewritten seventeen times before publication.
Key highlights
- Caching is deterministic based on step inputs and a user-managed
VERSIONstring, not source-code bytes - Steps are plain Python functions with a decorator; configs are Jsonnet, not YAML soup
- Integrations ship separately (
torch,wandb,datasets) so you install only what you need - Prebuilt Docker images with CUDA variants for GPU workflows
- CLI includes a
tango infodiagnostic and plays nicely withpdb
Caveats
- The README’s quick-start example is trivial; real-world step composition and dependency handling are only covered in external docs
- Jsonnet as the config format adds a learning curve if your team is all-in on YAML or Python-native configs
- 571 stars suggests modest adoption; ecosystem maturity relative to Metaflow or Airflow is unclear
Verdict
Worth a look if you’re doing collaborative ML research where code churns daily and production workflow engines feel like overkill. Skip it if you need production monitoring, dynamic task scheduling, or already have a caching layer you trust.