okuvshynov/slowllama
A fine-tuning tool for Llama2 and CodeLlama models, including 70B/35B variants, on MacBook Air or consumer GPUs without quantization.

slowllama enables fine-tuning of large language models on memory-constrained devices by offloading model components to SSD or main memory during both forward and backward passes. It uses LoRA (Low-Rank Adaptation) to limit parameter updates to a smaller set of weights, making training feasible on limited hardware. The project supports Llama2 and CodeLlama variants up to 70B parameters on Apple M1/M2 devices and consumer NVIDIA GPUs.