radixark/miles
Enterprise reinforcement learning framework for LLM and VLM post-training.

Miles is an enterprise-grade reinforcement learning framework designed for post-training large language models and vision-language models. It provides high-performance rollout capabilities, supports training backends like Megatron and FSDP, and integrates with SGLang for inference optimization. The framework includes advanced features such as INT4 quantization-aware training for fitting large models into limited VRAM and unified multi-turn training pipelines for both VLM and LLM training.