← all repositories

henrywoo/pyllama

A Python library for running LLaMA foundation models on a single 4GB GPU with optimized inference.

pyllama
Velocity · 7d
+2.3
★ / day
Trend
steady
star history

pyllama is a modified implementation of Facebook’s LLaMA models designed to run efficiently on consumer-grade GPUs with as little as 4GB VRAM. It provides a Hugging Face compatibility layer via pyllama.hf and utilities for downloading model weights and tokenizers for all LLaMA sizes (7B to 65B). The library aims to make large language model inference accessible without requiring enterprise hardware.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.