rlancemartin/auto-evaluator
An evaluation tool for measuring the quality of LLM-based question-answering chains built with Langchain.

Velocity · 7d
+0.9
★ / day
Trend
→steady
star history
Auto-evaluator generates question-answer pairs from documents using LLMs, runs them through configurable QA chains built with Langchain, and uses LLMs to score the responses. Users can experiment with text splitting, embedding methods, retrieval strategies, and grading prompts to compare chain performance across configurations. It runs as a Streamlit app and supports models from OpenAI, Anthropic, and Hugging Face.