← all repositories
IBM/FfDL

IBM's abandoned Kubernetes ML platform: a fossil from 2017

FfDL was IBM's attempt to run TensorFlow and PyTorch as a service on Kubernetes—now frozen in read-only mode.

FfDL
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

What it does

FfDL (“fiddle”) deploys deep learning training jobs—TensorFlow, Caffe, PyTorch, Keras—onto Kubernetes clusters via Helm charts. It handles the plumbing: S3-backed storage, GPU scheduling, Grafana dashboards, and a CLI for job submission. Think of it as an early, on-prem alternative to SageMaker or Google AI Platform.

The interesting bit

The project is explicitly dead: “This repository will not be updated.” What’s left is a time capsule of 2017-era ML infrastructure—complete with Kubeadm-DIND support for local testing and a migration path to IBM’s Watson Studio cloud service. The architecture diagram shows a classic microservices maze: REST API, lifecycle manager, learner pods, and object storage plugins all chatting through Kubernetes.

Key highlights

  • Framework-agnostic training with GPU support and Jupyter notebook integration
  • Helm-based deployment with explicit IBM Cloud storage class dependencies (ibmc-file-gold)
  • Built-in adversarial robustness testing via IBM’s ART toolkit
  • Model serving through Seldon integration
  • Minimum footprint: 4GB RAM, 3 CPUs—modest by today’s standards

Caveats

  • Read-only archive; no patches, no security updates
  • Only tested on Mac OS and Linux; Windows is unsupported
  • Storage plugin headaches are documented in the troubleshooting section—jobs get “stuck in pending” without correct S3 credentials

Verdict

Worth a look if you’re studying how early Kubernetes ML platforms were architected, or maintaining legacy IBM infrastructure. Everyone else should use Kubeflow, Ray, or a managed cloud service instead.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.