← all repositories
flink-extended/dl-on-flink

Flink learns to tensor: a JVM babysitter for your GPU jobs

It wraps TensorFlow and PyTorch inside Flink operators so a Java cluster can manage Python deep-learning tasks without pretending to be a scheduler itself.

dl-on-flink
Velocity · 7d
+0.3
★ / day
Trend
steady
star history

What it does Deep Learning on Flink stuffs TensorFlow or PyTorch training jobs inside a Flink operator. Flink handles the boring parts—cluster setup, resource bookkeeping, data connectors, and failure recovery—while the Python frameworks do the matrix math. You build Java, you build Python, you initialize submodules, and you hope both runtimes agree on the day of the week.

The interesting bit The project treats Flink as a distributed babysitter rather than a compute engine. It does not rewrite SGD in Java; it simply keeps the Python processes alive, fed, and checkpointed. That is a pragmatic admission that Flink’s value is plumbing, not gradients.

Key highlights

  • Supports TensorFlow 1.15.x / 2.4.x and PyTorch 1.11.x against Flink 1.14.x
  • Tested only on Ubuntu 18.04 and macOS 10.15 (64-bit)
  • Requires Python 3.7, Java 8, Maven ≥3.3.0, and cmake ≥3.6
  • Build involves both mvn install and pip install with submodule initialization
  • Separate Python packages for TF 1.x and 2.x; you pick one, not both

Caveats

  • README says “supports TensorFlow, PyTorch, etc.” but the fine print reveals “currently supports TensorFlow” with PyTorch docs existing but less emphasized
  • OS support list is narrow and aging; no mention of newer Ubuntu LTS or Apple Silicon
  • Build process is fiddly: submodules, dual-language compilation, and mutually exclusive TensorFlow wheel choices

Verdict Worth a look if you are already committed to Flink and need to bolt on distributed deep learning without adding Yet Another Cluster Manager. Skip it if you want a polished, framework-agnostic ML platform or if your stack is Kubernetes-native already.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.