henrywoo/pyllama
A Python library for running LLaMA foundation models on a single 4GB GPU with optimized inference.

Velocity · 7d
+2.3
★ / day
Trend
→steady
star history
pyllama is a modified implementation of Facebook’s LLaMA models designed to run efficiently on consumer-grade GPUs with as little as 4GB VRAM. It provides a Hugging Face compatibility layer via pyllama.hf and utilities for downloading model weights and tokenizers for all LLaMA sizes (7B to 65B). The library aims to make large language model inference accessible without requiring enterprise hardware.