← all repositories

marcotcr/checklist

A behavioral testing framework for evaluating NLP models with templates, perturbation functions, and test suites.

2k stars Jupyter Notebook LLMOps · Eval
checklist
Velocity · 7d
+0.9
★ / day
Trend
steady
star history

CheckList provides a methodology and tooling for behavioral testing of NLP models beyond standard accuracy metrics. It includes templates for generating test cases, perturbation functions for creating adversarial examples, and integration with HuggingFace transformer pipelines. The framework helps practitioners systematically test model capabilities and vulnerabilities across different linguistic phenomena.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.