databricks/lilac
A dataset exploration and curation tool for improving LLM training and fine-tuning data quality.

Velocity · 7d
+0.9
★ / day
Trend
→steady
star history
Lilac provides visualization and quality control capabilities for LLM training datasets. It helps teams explore, quantify, and improve pre-training and fine-tuning data through a Python API and on-device UI. The tool integrates open-source LLMs to run computations locally, supporting data filtering, clustering, and semantic search across large datasets.