Is miles open source?

Yes — radixark/miles is open source, released under the Apache-2.0 license.

What language is miles written in?

radixark/miles is primarily written in Python.

How popular is miles?

radixark/miles has 1.8k stars on GitHub and is currently cooling off.

Where can I find miles?

radixark/miles is on GitHub at https://github.com/radixark/miles.

← all repositories

radixark/miles

The RL framework that treats train-inference mismatch as a bug, not a feature

A production-hardened fork of slime that keeps massive MoE models from collapsing by obsessing over bit-wise alignment between rollout and training.

★1.8k stars Python ML Frameworks Inference · Serving LLMOps · Eval

View on GitHub ↗ Homepage ↗

Velocity · 7d

+5.0

★ / day

Trend

↘cooling

star history

What it does

Miles is a reinforcement learning framework for post-training large language and vision-language models at enterprise scale. It wraps SGLang for high-throughput inference and Megatron-LM for distributed training, with explicit support for multi-turn conversations, multi-agent co-evolution, and reasoning/coding tasks across DeepSeek, Qwen, Llama, and other major model families.

The interesting bit

The core obsession is eliminating “train-inference mismatch” — the subtle divergence between how a model generates tokens during rollout and how gradients flow during backprop. Miles attacks this at multiple levels: kernel-level determinism via FlashAttention-3 and DeepGEMM, algorithmic corrections (Truncated/Masked Importance Sampling), and for MoE models, a technique called Rollout Routing Replay that literally records which expert handled which token during inference and replays those decisions during training. The INT4 QAT pipeline also lets 1TB models squeeze onto single H200 machines, which is less about frugality and more about eliminating cross-node communication as a bottleneck.

Key highlights

End-to-end FP8 sampling and training with unified quantization logic across rollout and training
R3 (Rollout Routing Replay) for bit-wise expert alignment in MoE models like DeepSeek-V3 and Qwen3
INT4 W4A16 QAT that claims BF16-equivalent accuracy at roughly half the memory footprint
Speculative decoding with an online SFT draft model that gets updated during RL, not frozen
Zero-copy weight sync via CUDA IPC, cutting sync time by ~50% versus HTTP/RPC
Forked from slime; co-evolving with it but explicitly targeting production stability over research flexibility

Caveats

Several headline features are marked completed but the roadmap still lists “Zero mismatch for MoE RL” and “Aligning SGLang with Megatron in MoE Models” as in-progress, suggesting the MoE story is partially aspirational
The “25%+ rollout speedup” claim for speculative RL lacks independent verification in the README
Multi-agent support via MrlX is technically an external framework that Miles “supports,” not native functionality

Verdict

Worth a look if you’re running RL post-training on 100B+ parameter models and have already experienced the particular joy of training collapse at 3 AM. Probably overkill if you’re fine-tuning 7B models on a single A100 or treating RL as a quick recipe swap.

Frequently asked

What is radixark/miles?: A production-hardened fork of slime that keeps massive MoE models from collapsing by obsessing over bit-wise alignment between rollout and training.
Is miles open source?: Yes — radixark/miles is open source, released under the Apache-2.0 license.
What language is miles written in?: radixark/miles is primarily written in Python.
How popular is miles?: radixark/miles has 1.8k stars on GitHub and is currently cooling off.
Where can I find miles?: radixark/miles is on GitHub at https://github.com/radixark/miles.