← all repositories

databricks/lilac

A dataset exploration and curation tool for improving LLM training and fine-tuning data quality.

1.1k stars Python Data ToolingLLMOps · Eval
lilac
Velocity · 7d
+0.9
★ / day
Trend
steady
star history

Lilac provides visualization and quality control capabilities for LLM training datasets. It helps teams explore, quantify, and improve pre-training and fine-tuning data through a Python API and on-device UI. The tool integrates open-source LLMs to run computations locally, supporting data filtering, clustering, and semantic search across large datasets.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.