hamelsmu/evals-skills
A plugin of AI evaluation skills that guides AI coding agents to audit, diagnose, and improve LLM evaluation pipelines.

This repository provides a collection of skills designed to guide AI coding agents in building and auditing LLM evaluation pipelines. It includes skills like eval-audit to surface common problems in evaluations and error-analysis to help categorize failures from traces. The skills are distributed as a plugin for Claude Code and as a standalone CLI tool, allowing developers to integrate evaluation guidance directly into their AI assistant workflows.