← all repositories
tensorflow/tfx

Google's ML pipeline glue, now with extra dependencies

TFX wires TensorFlow training into production pipelines, then hands you a compatibility matrix that reads like a software bill of materials.

2.2k stars Python LLMOps · EvalML Frameworks
tfx
Velocity · 7d
+0.8
★ / day
Trend
steady
star history

What it does

TFX is Google’s open-sourced framework for building production ML pipelines on top of TensorFlow. It provides pre-built components for the usual suspects—data validation, transformation, training, evaluation—then orchestrates them through Apache Airflow or Kubeflow Pipelines. Every component run gets logged to an ML Metadata backend, which means you can trace lineage, resume from checkpoints, or figure out which data version broke your model last Tuesday.

The interesting bit

The metadata layer is the quiet workhorse. Most pipeline tools treat provenance as an afterthought; TFX bakes it in from the start, using ML Metadata to track artifacts, executions, and configurations across runs. That makes warmstarting and experiment tracking structural features, not bolt-ons.

Key highlights

  • Component-based pipeline framework with configurable stages for data ingestion through model serving
  • Native orchestration support for Apache Airflow and Kubeflow Pipelines
  • Built-in ML Metadata integration for artifact lineage and run history
  • Extensible: both components and orchestrator integrations can be customized
  • Includes a worked example (Chicago Taxi pipeline) for getting oriented

Caveats

  • Dependency matrix is extensive and tightly pinned; the README compatibility table spans 11 packages with version lockstep requirements
  • Python support currently limited to 3.9–3.10 for recent releases
  • “Google-production-scale” in the description; actual resource requirements for that scale are unspecified

Verdict

Worth evaluating if you’re already committed to TensorFlow and need auditability across pipeline runs. Probably overkill if your model retrains once a month and your current cron job isn’t broken.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.