← all repositories
modelscope/ms-swift

A Swiss Army knife for fine-tuning 1,000+ models without losing your mind

ModelScope's ms-swift wraps the entire LLM/Multimodal training pipeline—from LoRA to Megatron GRPO—into a single Python framework with Day-0 model support.

14.4k stars Python ML FrameworksLLMOps · Eval
ms-swift
Velocity · 7d
+14
★ / day
Trend
steady
star history

What it does ms-swift is a training and deployment framework that covers the full lifecycle of large language and multimodal models: pre-training, supervised fine-tuning, RLHF (DPO, GRPO, KTO, etc.), inference, evaluation, quantization, and deployment. It supports 600+ text-only models and 400+ multimodal models including Qwen3, DeepSeek-R1, Llama4, and Qwen3-VL. The project also provides a Web UI for click-through training and integrates vLLM, SGLang, and LMDeploy for inference acceleration.

The interesting bit The breadth is almost comical: it doesn’t just do LoRA and QLoRA, it also wraps Megatron-scale distributed training (TP, PP, CP, EP) and a whole zoo of GRPO-family algorithms (DAPO, GSPO, SAPO, CHORD, Reinforce++). The “Mcore-Bridge” abstraction tries to make Megatron training feel as simple as HuggingFace transformers—an ambitious bit of plumbing that could save weeks of config archaeology.

Key highlights

  • 600+ text models and 400+ multimodal models with Day-0 support
  • Lightweight methods: LoRA, QLoRA, DoRA, ReFT, Adapter, LISA, and more
  • Quantized training on BNB/AWQ/GPTQ models (claims 9GB VRAM for 7B models)
  • Megatron parallelism for MoE models, plus Ulysses/Ring-Attention sequence parallelism for long contexts
  • Built-in 150+ datasets and support for mixed-modality training (text, image, video, audio)
  • Full pipeline: training → evaluation (EvalScope) → quantization (GPTQ/AWQ/FP8) → deployment
  • AAAI 2025 published paper

Caveats

  • The README is a feature list firehose; actual API ergonomics are hard to judge without using it
  • “2026.03.03” release date in the News section appears to be a typo (likely meant 2025)
  • With so many integrations, version compatibility across PyTorch, CUDA, Megatron, and various inference engines is a maintenance burden you’ll own

Verdict Worth a look if you’re doing production fine-tuning across many model architectures and need one toolchain instead of five. Probably overkill if you just need to LoRA-tune a single Llama model on a single GPU—use axolotl or unsloth instead.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.