← all repositories

hamelsmu/evals-skills

A plugin of AI evaluation skills that guides AI coding agents to audit, diagnose, and improve LLM evaluation pipelines.

evals-skills
Velocity · 7d
+14
★ / day
Trend
steady
star history

This repository provides a collection of skills designed to guide AI coding agents in building and auditing LLM evaluation pipelines. It includes skills like eval-audit to surface common problems in evaluations and error-analysis to help categorize failures from traces. The skills are distributed as a plugin for Claude Code and as a standalone CLI tool, allowing developers to integrate evaluation guidance directly into their AI assistant workflows.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.