← all repositories
oumi-ai/oumi

One CLI to fine-tune, evaluate, and deploy LLMs from laptop to cloud

Oumi wraps the messy lifecycle of foundation models—training, inference, evaluation, and deployment—into a single config-driven CLI that works on your laptop or a GPU cluster.

oumi
Velocity · 7d
+12
★ / day
Trend
steady
star history

What it does

Oumi is an open-source Python platform that tries to own the entire foundation-model lifecycle. You write YAML configs, then run oumi train, oumi evaluate, oumi infer, or oumi launch to push jobs to AWS, Azure, GCP, or Lambda. It supports models from 10M to 405B parameters, text and vision-language variants, and plugs into vLLM, SGLang, and commercial APIs like OpenAI or Anthropic.

The interesting bit

The project keeps up with model releases almost weekly—Gemma 4, Qwen3.5, gpt-oss, Llama 4, DeepSeek-R1 all have pre-baked “recipes” in configs/recipes/. That is the real value: not the training code itself, which mostly delegates to Transformers, TRL, and vLLM, but the curated configs and dependency management that let you swap from a 135M SmolLM test run to a 400B Llama 4 Maverick job without rewriting boilerplate.

Key highlights

  • CLI covers SFT, LoRA, QLoRA, GRPO, distillation, LLM-as-a-Judge data curation, and now oumi deploy for dedicated inference endpoints
  • Ships with Colab notebooks for every major workflow, plus Docker images and an experimental curl-to-install script
  • MCP server (oumi-mcp) for Claude/Cursor integration as of v0.8
  • Batch API support across Anthropic, Fireworks, and Together
  • Active release cadence: v0.8 dropped in May 2026 with Transformers v5, TRL v0.30, and vLLM v0.19 upgrades

Caveats

  • The README is heavy on feature lists and light on architecture details; it is unclear how much Oumi abstracts versus merely orchestrates
  • “Production-grade reliability” is claimed but not substantiated with benchmarks or case studies in the provided text
  • Cloud launch configs and remote job management likely require non-trivial IAM and quota setup that the quickstart glosses over

Verdict

Worth a look if you are tired of stitching together Hugging Face, TRL, and cloud SDKs by hand and want opinionated defaults. Skip it if you already have a mature internal training platform or need deep visibility into the training loop itself.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.