← all repositories
mattmireles/gemma-tuner-multimodal

Fine-tune Gemma on your MacBook, no H100 required

A multimodal LoRA trainer that runs on Apple Silicon and streams training data from the cloud so your SSD doesn't drown.

1.5k stars Python Language ModelsML Frameworks
gemma-tuner-multimodal
Velocity · 7d
+24
★ / day
Trend
steady
star history

What it does This is a PyTorch-based fine-tuning pipeline for Google’s Gemma 4 and Gemma 3n models, built specifically for Apple Silicon. It handles text, image, and audio modalities through LoRA adapters, using Metal Performance Shaders instead of CUDA. The dataloader can stream shards from GCS or BigQuery, which means you can train on terabyte-scale datasets without copying everything to a laptop SSD.

The interesting bit The built-in training visualizer runs in your browser and shows loss curves, attention heatmaps, gradient signal strength, memory pressure, and token predictions — all updating live, no TensorBoard or notebook required. The README claims setup takes 30 seconds. The wizard CLI walks through model selection, dataset pairing, and hyperparameters, then spawns training with a single command.

Key highlights

  • Supports text-only, image+text, and audio+text fine-tuning via CSV datasets (local for image/text; streaming available for all modalities)
  • Targets Gemma 3n E2B/E4B and Gemma 4 E2B/E4B checkpoints through Hugging Face + PEFT LoRA
  • MPS-native with fallback to CUDA or CPU; explicitly designed for Macs without NVIDIA GPUs
  • Hierarchical INI configuration with profiles, plus a system-check command to surface environment issues before training fails
  • Ships a 16-row sample dataset for sub-minute end-to-end pipeline verification

Caveats

  • Gemma 4’s larger weights (26B/31B class) use a different Transformers architecture and are not yet supported for training
  • Some non-training commands (gemma_generate, ASR eval, multimodal probing) still reject Gemma 4 IDs pending code updates
  • Image fine-tuning is local CSV only in v1; audio+text is the standout modality versus competitors

Verdict Worth a look if you’re on Apple Silicon and need to fine-tune Gemma on proprietary audio or vision data without renting cloud GPUs. Skip it if you’re on Linux/NVIDIA (Unsloth or axolotl will be faster) or if you need the larger Gemma 4 variants that aren’t wired up yet.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.