princeton-nlp/LLM-Shearing
A research project that creates efficient smaller language models by structured pruning of larger LLaMA models.

Velocity · 7d
+0.7
★ / day
Trend
→steady
star history
Sheared-LLaMA implements structured pruning to accelerate language model pre-training by converting large models (e.g., Llama-2-7B) into smaller but equally capable versions (1.3B, 2.7B parameters) at a fraction of the training cost. The codebase provides pruning and continued pre-training algorithms, releasing both pruned base models and instruction-tuned variants on HuggingFace.