Is can-ai-code open source?

Yes — the-crypt-keeper/can-ai-code is open source, released under the MIT license.

What language is can-ai-code written in?

the-crypt-keeper/can-ai-code is primarily written in Python.

How popular is can-ai-code?

the-crypt-keeper/can-ai-code has 598 stars on GitHub.

Where can I find can-ai-code?

the-crypt-keeper/can-ai-code is on GitHub at https://github.com/the-crypt-keeper/can-ai-code.

← all repositories

the-crypt-keeper/can-ai-code

The benchmark that killed itself by winning

A coding benchmark evolved into a reasoning stress-test that auto-scales difficulty so models can never truly "pass."

★598 stars Python LLMOps · Eval Coding Assistants

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does

Can-AI-Code started as a simple question: can LLMs write valid code? Two years later, the answer is “obviously yes,” so the project pivoted hard. It’s now a self-scaling reasoning benchmark that generates unlimited unique problems across two difficulty axes—length (working memory stress) and depth (structural complexity)—then measures how far each model climbs before failing.

The interesting bit

The author ran 200+ million tokens through consumer RTX 3090s in his basement (blowing breakers in the process) and found models have distinct “cognitive fingerprints.” OpenAI crushes boolean logic but chokes on tokenization; Qwen’s smaller models get 250% boosts from extra thinking time; Llama is the balanced generalist. The benchmark auto-toughens when models cluster above 90% accuracy, so it theoretically can’t go stale.

Key highlights

Parametric generators create infinite unique problems—no memorization, no fixed test sets
Measures three things: height (max difficulty reached), efficiency (tokens burned), and constrained performance (limited resources)
Identified working memory as the “universal bottleneck” and tokenization as a persistent Achilles heel across nearly all models
Framework is domain-agnostic; author plans spatial reasoning, causal inference, and creative synthesis next
Consumer-hardware research: two RTX 3090s, blown fuses, and a lot of curiosity

Caveats

The new “Can-AI-Think” benchmark suite is described as “available soon”—the README is essentially a pre-release announcement
Results cited (80% boolean logic accuracy, 250% Qwen boost) lack methodological detail; replication would require the unreleased generators
The auto-scaling difficulty mechanism sounds elegant but is untested at scale—no evidence yet that it won’t create its own ceiling

Verdict

Worth watching if you benchmark models or study reasoning architectures. Skip it if you need something you can run today—the new suite isn’t out yet, and the original coding benchmark is explicitly retired. The real signal here is the framework design, not the current repo contents.

Frequently asked

What is the-crypt-keeper/can-ai-code?: A coding benchmark evolved into a reasoning stress-test that auto-scales difficulty so models can never truly "pass."
Is can-ai-code open source?: Yes — the-crypt-keeper/can-ai-code is open source, released under the MIT license.
What language is can-ai-code written in?: the-crypt-keeper/can-ai-code is primarily written in Python.
How popular is can-ai-code?: the-crypt-keeper/can-ai-code has 598 stars on GitHub.
Where can I find can-ai-code?: the-crypt-keeper/can-ai-code is on GitHub at https://github.com/the-crypt-keeper/can-ai-code.