FedML-AI/FedML
A unified and scalable machine learning library for distributed training, model serving, and federated learning across GPU clouds and edge devices.

FEDML provides an open-source library for running machine learning workloads at scale across decentralized GPUs, multi-clouds, edge servers, and smartphones. It supports large-scale distributed training, on-device federated learning, and model deployment workflows. TensorOpera AI, the commercial platform built on FEDML, adds an MLOps layer and job scheduler for launching and orchestrating complex AI jobs including LLMs and generative AI model training.