← all repositories
huggingface/peft

Fine-tune 12B models on a single GPU without crying

Hugging Face's PEFT library makes parameter-efficient fine-tuning feel like cheating—train 0.1% of weights, keep 99% of the performance.

peft
Velocity · 7d
+16
★ / day
Trend
steady
star history

What it does PEFT wraps giant pretrained models so you only fine-tune a tiny sliver of parameters—LoRA adapters, soft prompts, IA³, and friends. The base model stays frozen. You save GPU memory, disk space, and your sanity. It plugs straight into Transformers, Diffusers, Accelerate, and TRL, so the “integration” is mostly get_peft_model(model, config) and you’re off.

The interesting bit The README includes hard memory numbers that actually mean something: a 12B parameter model goes from OOM on an 80GB A100 to 56GB with LoRA, or 22GB with DeepSpeed CPU offloading. A 3B model drops from 47GB to 14GB. The tradeoff? Accuracy on a downstream task lands at 0.863 versus Flan-T5’s 0.892—not identical, but close enough that your wallet won’t care. Checkpoint sizes shrink from 11GB to 19MB. That’s the whole pitch, and it’s a good one.

Key highlights

  • Supports LoRA, adapters, soft prompts, IA³, and other PEFT methods with a unified API
  • get_peft_model() wraps any compatible model; print_trainable_parameters() shows exactly how little you’re training
  • Switch between multiple adapters at runtime with set_adapter() in Transformers
  • Combines with quantization (QLoRA, 8-bit) to squeeze even larger models onto consumer GPUs
  • Works with diffusion models too—Stable Diffusion LoRA checkpoints clock in at 8.8MB

Caveats

  • The Transformers integration doesn’t include adapter merging; you need PEFT directly for that
  • “State-of-the-art” in the tagline is doing some heavy lifting—performance is comparable, not always matching, full fine-tuning
  • Model support is broad but not universal; custom architectures need manual config

Verdict If you’re fine-tuning LLMs or diffusion models and not using PEFT, you’re probably burning money. Skip it only if you genuinely need every last drop of accuracy and have the hardware budget to match.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.