open-compass/VLMEvalKit
An open-source evaluation toolkit for large vision-language models supporting one-command benchmarking across 220+ models and 80+ benchmarks.

Velocity · 7d
+4.6
★ / day
Trend
→steady
star history
VLMEvalKit is a Python evaluation framework for large vision-language models that streamlines model benchmarking without manual data preparation. It supports 220+ LMMs and 80+ benchmarks, providing both exact matching and LLM-based answer extraction for evaluation results. The toolkit implements generation-based evaluation for all supported vision-language models.