← all repositories

princeton-nlp/LESS

A data selection framework that identifies influential training examples to improve specific LLM capabilities through targeted instruction tuning.

528 stars Jupyter Notebook Data ToolingLLMOps · Eval
LESS
Velocity · 7d
+0.6
★ / day
Trend
steady
star history

LESS provides a method to select the most impactful training data for LLM instruction tuning by building a gradient datastore and scoring examples based on their influence on target capabilities. The pipeline involves warmup training, gradient collection, and influence-based selection across datasets like Flan v2, COT, Dolly, and Open Assistant. The selected data is then used for fine-tuning to induce specific capabilities in models like Llama and Mistral.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.