← all repositories

modelscope/evalscope

A one-stop evaluation framework for benchmarking LLMs, VLMs, embedding models, and AIGC systems with built-in benchmarks.

2.9k stars Python LLMOps · Eval
evalscope
Velocity · 7d
+3.2
★ / day
Trend
steady
star history

EvalScope provides a streamlined framework for evaluating large language models, vision-language models, and generative AI systems. It supports multiple evaluation benchmarks including MMLU, C-Eval, and GSM8K, along with inference performance stress testing and result visualization. The framework is designed for single-command evaluation workflows and integrates with ModelScope ecosystem.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.