← all repositories

camel-ai/crab

A Python framework for building and running benchmark environments to evaluate multimodal LLM agents.

422 stars Python LLMOps · EvalAgents
crab
Velocity · 7d
+0.6
★ / day
Trend
steady
star history

CRAB provides a framework for creating standardized benchmarks to assess language model agents in cross-platform environments. It allows defining agent tasks through Python decorators and includes a novel graph-based evaluation methodology. The framework supports deploying agent environments via Docker, virtual machines, or in-memory processes while maintaining a unified interface for evaluation.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.