IBM's abandoned Kubernetes ML platform: a fossil from 2017
FfDL was IBM's attempt to run TensorFlow and PyTorch as a service on Kubernetes—now frozen in read-only mode.

What it does
FfDL (“fiddle”) deploys deep learning training jobs—TensorFlow, Caffe, PyTorch, Keras—onto Kubernetes clusters via Helm charts. It handles the plumbing: S3-backed storage, GPU scheduling, Grafana dashboards, and a CLI for job submission. Think of it as an early, on-prem alternative to SageMaker or Google AI Platform.
The interesting bit
The project is explicitly dead: “This repository will not be updated.” What’s left is a time capsule of 2017-era ML infrastructure—complete with Kubeadm-DIND support for local testing and a migration path to IBM’s Watson Studio cloud service. The architecture diagram shows a classic microservices maze: REST API, lifecycle manager, learner pods, and object storage plugins all chatting through Kubernetes.
Key highlights
- Framework-agnostic training with GPU support and Jupyter notebook integration
- Helm-based deployment with explicit IBM Cloud storage class dependencies (
ibmc-file-gold) - Built-in adversarial robustness testing via IBM’s ART toolkit
- Model serving through Seldon integration
- Minimum footprint: 4GB RAM, 3 CPUs—modest by today’s standards
Caveats
- Read-only archive; no patches, no security updates
- Only tested on Mac OS and Linux; Windows is unsupported
- Storage plugin headaches are documented in the troubleshooting section—jobs get “stuck in pending” without correct S3 credentials
Verdict
Worth a look if you’re studying how early Kubernetes ML platforms were architected, or maintaining legacy IBM infrastructure. Everyone else should use Kubeflow, Ray, or a managed cloud service instead.