yizhongw/self-instruct
A framework that helps language models improve instruction-following by generating their own instructional training data through bootstrapping.

Self-Instruct is an iterative bootstrapping algorithm that starts with seed instructions and prompts a language model to generate new instructions and input-output pairs. Generated data is filtered for quality and diversity, then added back to the task pool to create a large instructional dataset. This dataset is used to fine-tune the language model, improving its ability to follow natural language instructions without requiring extensive manual annotation.