← all repositories
airbnb/chronon

Airbnb's answer to the feature store headache

Chronon tries to close the gap between batch backfills and real-time serving so ML engineers stop building bespoke data pipelines.

1k stars Scala Data ToolingLLMOps · Eval
chronon
Velocity · 7d
+0.5
★ / day
Trend
steady
star history

What it does

Chronon is Airbnb’s open-source feature platform. You write Python definitions that declare how raw data—batch tables, event streams, or entity snapshots—should be transformed into features. Chronon then handles batch and streaming computation, backfills, low-latency serving, and monitors data freshness and consistency between training and production.

The interesting bit

The core abstraction is a Join that isn’t just a database join. It defines the exact timestamps and keys for point-in-time backfills, guaranteeing that training data matches what the model will see at inference time. This is the subtle part that usually breaks silently in homegrown pipelines.

Key highlights

  • Declarative Python API: GroupBy for aggregations, Join for combining feature sets into training datasets
  • Point-in-time accurate backfills with guaranteed consistency against online serving
  • Managed pipelines for both batch and streaming computation
  • Built-in observability: data freshness, online/offline consistency checks
  • Docker quickstart with fabricated retail-fraud dataset to test the full flow

Caveats

  • README is truncated mid-sentence in the online serving section, so some operational details are unclear
  • Quickstart explicitly excludes streaming jobs and model training steps
  • Scala project with Python API surface; the Spark/Thrift compilation step adds a layer of indirection

Verdict

Worth evaluating if you’re currently maintaining separate systems for feature backfills and real-time serving, or if you’ve been bitten by training-serving skew. Probably overkill for teams with simple, low-velocity feature sets or those already happy with a commercial feature store.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.