Is lmms-engine open source?

Yes — EvolvingLMMs-Lab/lmms-engine is an open-source project tracked on heatdrop.

What language is lmms-engine written in?

EvolvingLMMs-Lab/lmms-engine is primarily written in Python.

How popular is lmms-engine?

EvolvingLMMs-Lab/lmms-engine has 807 stars on GitHub.

Where can I find lmms-engine?

EvolvingLMMs-Lab/lmms-engine is on GitHub at https://github.com/EvolvingLMMs-Lab/lmms-engine.

← all repositories

EvolvingLMMs-Lab/lmms-engine

One training rig for vision, audio, and video diffusion

It unifies vision, language, audio, and generative model training behind a single, hackable PyTorch engine.

★807 stars Python ML Frameworks Language Models

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does

lmms-engine is a PyTorch training framework built to handle the full multimodal stack—vision-language models like Qwen3-VL, audio-language hybrids, diffusion language models, and video generators such as WanVideo—without splintering into separate codebases. It wraps distributed training, memory optimization, and data streaming into one configurable engine. The goal is to let researchers iterate on unified multimodal architectures at scale without drowning in boilerplate.

The interesting bit

The framework treats “unified” seriously: it covers understanding and generation, supports Mixture-of-Experts with Expert Parallelism, and offers a monkey-patching system for injecting kernels like Flash Attention or Native Sparse Attention without touching model code. That combination of breadth and surgical tweakability is unusual for a stack that also claims to be lean.

Key highlights

Supports 20+ architectures, from Qwen2.5-Omni and BAGEL to SiT and linear attention models, via an extensible @register_model() decorator.
Production-grade optimizations built-in: FSDP2, Ulysses sequence parallelism, Liger fused kernels (claimed ~30% memory reduction), and a Newton-Schulz Muon optimizer.
Sequence packing with full unpadding pushes MFU toward 35–40% on some vision-language finetuning tasks, versus 20–25% without.
Streaming datasets and multi-dimensional parallelism (TP × SP × DP) for trillion-token pretraining scale.

Verdict

Researchers building unified multimodal models—or anyone tired of gluing together separate LLM and diffusion training scripts—should look here. If you only train plain text LLMs and never touch a vision tower, it is probably overkill.

Frequently asked

What is EvolvingLMMs-Lab/lmms-engine?: It unifies vision, language, audio, and generative model training behind a single, hackable PyTorch engine.
Is lmms-engine open source?: Yes — EvolvingLMMs-Lab/lmms-engine is an open-source project tracked on heatdrop.
What language is lmms-engine written in?: EvolvingLMMs-Lab/lmms-engine is primarily written in Python.
How popular is lmms-engine?: EvolvingLMMs-Lab/lmms-engine has 807 stars on GitHub.
Where can I find lmms-engine?: EvolvingLMMs-Lab/lmms-engine is on GitHub at https://github.com/EvolvingLMMs-Lab/lmms-engine.