tony-framework/TonY
TonY enables distributed deep learning training jobs to run as native Hadoop/YARN applications with GPU support.

TonY is a framework that natively integrates deep learning frameworks with Apache Hadoop, allowing TensorFlow, PyTorch, MXNet, and Horovod training jobs to run on Hadoop clusters. It supports both single-node and distributed training configurations, can use either zipped Python virtual environments or Docker containers for execution, and provides GPU isolation capabilities through Hadoop. It serves as a connector between ML training workloads and Hadoop’s resource management infrastructure.