fchollet/ARC-AGI
A dataset and browser interface for testing AI systems on abstract reasoning tasks, consisting of 800 grid-based puzzles.

The Abstraction and Reasoning Corpus (ARC-AGI) is a benchmark designed to measure general fluid intelligence in AI systems, similar to how human IQ tests work. Each task presents grid-based input/output transformation pairs that require abstract reasoning to solve. The repository contains 400 training tasks and 400 evaluation tasks in JSON format, along with a browser-based interface for humans to attempt solving tasks manually. It serves as a primary evaluation benchmark for modern language models and agent systems.