OpenGenerativeAI/llm-colosseum
A gaming-based benchmark that pits LLMs against each other in Street Fighter III to evaluate decision-making speed, strategy, and adaptability in real time.

Velocity · 7d
+1.8
★ / day
Trend
→steady
star history
The platform runs LLM-vs-LLM matches in Street Fighter III, measuring how well each model responds to game state, makes rapid decisions, and adapts strategy over a full match. Each model earns an ELO rating based on fight outcomes. The benchmark tracks criteria like decision speed, strategic depth, and resilience across hundreds of fights.