open-gigaai/giga-train
A distributed training framework for AI models supporting DeepSpeed, FSDP, DDP, and mixed precision optimization.

GigaTrain provides an efficient and scalable training framework for developing large AI models. It offers unified distributed training across multi-GPU and multi-node environments, supporting DeepSpeed ZeRO, FSDP, FSDP2, and DDP strategies. The framework includes mixed precision training (FP16/BF16/FP8), gradient accumulation and checkpointing, EMA, built-in monitoring, and robust checkpointing for resumable long runs. It uses a registry-driven modular design with pluggable components for optimizers, schedulers, and transforms.