modelscope/evalscope
A one-stop evaluation framework for benchmarking LLMs, VLMs, embedding models, and AIGC systems with built-in benchmarks.

Velocity · 7d
+3.2
★ / day
Trend
→steady
star history
EvalScope provides a streamlined framework for evaluating large language models, vision-language models, and generative AI systems. It supports multiple evaluation benchmarks including MMLU, C-Eval, and GSM8K, along with inference performance stress testing and result visualization. The framework is designed for single-command evaluation workflows and integrates with ModelScope ecosystem.