sql-machine-learning/elasticdl
A distributed deep learning training framework that runs on Kubernetes with support for TensorFlow and PyTorch.

Velocity · 7d
+0.3
★ / day
Trend
→steady
star history
ElasticDL is a Kubernetes-native deep learning framework designed for distributed training across clusters. It provides fault-tolerance and elastic scheduling capabilities by leveraging Kubernetes’ scheduling and preemption features. The framework supports TensorFlow Estimator, TensorFlow Keras, and PyTorch, allowing users to define models using standard APIs and train them distributedly via command line.