← all repositories

rlancemartin/auto-evaluator

An evaluation tool for measuring the quality of LLM-based question-answering chains built with Langchain.

1.1k stars Python LLMOps · Eval
auto-evaluator
Velocity · 7d
+0.9
★ / day
Trend
steady
star history

Auto-evaluator generates question-answer pairs from documents using LLMs, runs them through configurable QA chains built with Langchain, and uses LLMs to score the responses. Users can experiment with text splitting, embedding methods, retrieval strategies, and grading prompts to compare chain performance across configurations. It runs as a Streamlit app and supports models from OpenAI, Anthropic, and Hugging Face.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.