locuslab/wanda
A PyTorch implementation of Wanda, a pruning method for large language models that removes weights per-output using the product of weight magnitudes and input activation norms.

Wanda implements a simple yet effective pruning approach for LLMs that evaluates weights on a per-output basis rather than individually. The method removes weights by computing the product of their magnitudes and the norms of corresponding input activations. The repository includes support for LLaMA, LLaMA-2, and OPT models, along with zero-shot evaluation capabilities and LoRA fine-tuning integration.