← all repositories

Docta-ai/docta

A Python tool that diagnoses and cures data quality issues for ML models, including fixing label errors in LLM alignment datasets.

3.5k stars Python Data ToolingLLMOps · Eval
docta
Velocity · 7d
+3.1
★ / day
Trend
steady
star history

Docta is an advanced data-centric AI platform that detects and rectifies issues in training data. It supports tabular, text, and image data as well as pre-trained model embeddings. The open-source version offers training-free data diagnosis, curation, and nutrition services. One key demo shows how to fix human annotation errors in LLM responses from Anthropic’s red teaming dataset (hh-rlhf), making it particularly useful for RLHF pipelines.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.