← all repositories
tloen/alpaca-lora

Fine-tune a ChatGPT rival on hardware that fits under your desk

Alpaca-LoRA reproduces Stanford's instruction-following model using parameter-efficient training so cheap that a single RTX 4090 and even a Raspberry Pi suffice.

18.9k stars Jupyter Notebook Language ModelsML Frameworks
alpaca-lora
Velocity · 7d
+16
★ / day
Trend
steady
star history

What it does

This repo provides training and inference scripts to turn Meta’s LLaMA into an instruction-following assistant comparable to text-davinci-003, but without the six-figure compute bill. It uses LoRA (low-rank adaptation) to freeze the base model and train only tiny adapter matrices, plus 8-bit quantization via bitsandbytes to squeeze memory further. The result: fine-tuning completes in hours on one consumer GPU, and the trained model can run inference on a Raspberry Pi.

The interesting bit

The cleverness is all in what you don’t train. LoRA adds small learnable matrices to attention layers while keeping the multi-billion-parameter base model untouched; at inference you can even merge the adapters back in or keep them separate. The README notes the code is a “straightforward application” of Hugging Face’s PEFT library—this is essentially a well-engineered recipe rather than novel research, but the community has run with it, producing adapters for 13 languages and sizes up to 65B.

Key highlights

  • Training code runs on a single RTX 4090; inference demo runs on a Raspberry Pi (per the README’s linked tweet)
  • Published LoRA weights for 7B, plus community adapters for 13B, 30B, and 65B models
  • Includes Docker and docker-compose setup for one-command deployment
  • Checkpoint export scripts to merge adapters for llama.cpp / alpaca.cpp compatibility
  • Gradio inference interface included; weights hosted on Hugging Face Hub

Caveats

  • The README explicitly states “without hyperparameter tuning” performance merely matches Stanford Alpaca, and “further tuning might be able to achieve better performance”—so out-of-the-box results are baseline, not optimized
  • bitsandbytes can be finicky; Windows users need workarounds and the repo points to GitHub issues for help
  • Multiple GPU support exists but requires manual configuration per issue #8
  • The authors note they are “continually fixing bugs and conducting training runs,” and weights are updated without versioning guarantees

Verdict

Grab this if you want to experiment with instruction-tuned LLMs on hardware you already own, or if you need a cheap base for domain-specific fine-tuning. Skip it if you’re looking for a polished, production-ready API or if you need guaranteed reproducibility from pinned model versions.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.