OpenDataBox/awesome-data-llm
Academic survey paper and curated collection of research on data-centric techniques for training and preparing LLMs.
Collecting fresh signals — velocity needs a few days of history.
collecting data…
star history
This repository hosts the official materials for a comprehensive survey on LLMs and data-centric methods. It collects and categorizes papers across topics including data acquisition, deduplication, filtering, synthesis, and selection for LLM training. The collection also covers related work on vision-language models and data analytics with LLMs.