One toolbox to fine-tune them all: FLUX, video, even audio
A single Python project that lets you train LoRAs on consumer GPUs across image, video, and audio diffusion models without swapping toolchains.

What it does
AI Toolkit is a unified training suite for diffusion models. You copy a YAML config, point it at a folder of images and captions, and run python run.py. It handles LoRA and full-model fine-tuning for a sprawling list of architectures—FLUX.1/2, SDXL, Wan video, LTX, even Ace Step audio—on a single consumer-grade Nvidia GPU. A web UI (Node.js, port 8675) and a Gradio wrapper offer point-and-click alternatives to the CLI.
The interesting bit The breadth is the story. Most training repos pick one model family and optimize the life out of it; this one treats model support as a feature in itself. It also auto-buckets varying aspect ratios without forcing you to crop or resize images, which is the kind of boring convenience that saves hours of preprocessing.
Key highlights
- Supports 20+ model families spanning image (FLUX, SDXL, HiDream, Qwen-Image), video (Wan, LTX), audio (Ace Step), and instruction/editing variants
- Runs on “consumer grade hardware”—the author targets 24 GB VRAM configs in examples
- Resume training from last checkpoint after
ctrl+c(with a warning not to interrupt during saves) - Native Modal and RunPod cloud templates for training without owning the GPU
- Experimental macOS Silicon support via a convenience script
Caveats
- WebP “currently has issues” for datasets; stick to jpg/png
- macOS support is explicitly experimental and undertested due to RAM constraints
- UI auth is a single env-var token—“mostly safe” is the author’s phrasing, not a security audit
Verdict Grab this if you want to fine-tune multiple diffusion modalities without maintaining a separate repo for each. Skip it if you need battle-tested, model-specific optimizations or are allergic to YAML configs.