huggingface/evaluate
A Python library from Hugging Face for standardized evaluation of machine learning models using plug-in metrics and measurement tools.

🤗 Evaluate is a library that makes evaluating and comparing models and reporting their performance easier and more standardized. It provides implementations of dozens of popular metrics spanning NLP to Computer Vision tasks, allowing users to load metrics like accuracy = load("accuracy") and evaluate ML models across any framework including Numpy, Pandas, PyTorch, TensorFlow, and JAX. The library also offers comparison and measurement tools to evaluate model performance.