TencentARC/LLaMA-Pro
A research project introducing block expansion to progressively extend LLaMA models, published at ACL 2024.

Velocity · 7d
+0.6
★ / day
Trend
→steady
star history
This project presents a method for progressively extending LLaMA models by inserting and training new transformer blocks, enabling efficient model capacity expansion without full retraining. The work includes released model weights on HuggingFace and demonstrates improvements on code and math benchmarks. Extensions to Mistral models are also provided.