← all repositories

open-compass/VLMEvalKit

An open-source evaluation toolkit for large vision-language models supporting one-command benchmarking across 220+ models and 80+ benchmarks.

4.2k stars Python LLMOps · EvalLanguage Models
VLMEvalKit
Velocity · 7d
+4.6
★ / day
Trend
steady
star history

VLMEvalKit is a Python evaluation framework for large vision-language models that streamlines model benchmarking without manual data preparation. It supports 220+ LMMs and 80+ benchmarks, providing both exact matching and LLM-based answer extraction for evaluation results. The toolkit implements generation-based evaluation for all supported vision-language models.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.