alvinreal/awesome-autoresearch

The autoresearch family tree, catalogued

A curated index of 40+ projects that took Andrej Karpathy's autoresearch loop and ran with it — in every direction at once.

★2.2k stars Learning Agents

View on GitHub ↗ Homepage ↗

Velocity · 7d

+27

★ / day

Trend

→steady

star history

What it does

This is an awesome-list that tracks the explosion of projects inspired by Karpathy’s autoresearch — the idea of an LLM running a tight loop of propose, execute, evaluate, repeat. It sorts descendants into categories: general-purpose forks, research agents, platform ports, domain-specific adaptations, and benchmarks. Each entry gets a one-line description and a star badge.

The interesting bit

The list reveals how fast a single demo can fracture into an ecosystem. Within months there are Claude Code skills, Codex-native ports, Gemini CLI adaptations, swarm-intelligence variants, and even ICLR papers (ADAS, GEPA, SICA) that formalize the meta-learning angle. Someone made a pi extension. Someone else made it run overnight with --yolo --prompt.

Key highlights

General-purpose forks include recursive self-improvement frameworks, dashboard-first runtimes (Thoth), and cross-platform glue that auto-detects hardware config.
Research-agent systems go end-to-end: ARK orchestrates six agents from idea to LaTeX via Telegram; AutoSci builds a structured knowledge wiki with an interactive graph.
Academic rigor shows up too: GEPA uses genetic-Pareto prompt evolution, the Huxley-Gödel Machine targets SWE-bench, and EvoSkill distills failed trajectories into reusable skills.
Hardware and platform ports span single-GPU swarms, headless overnight modes, and distillation into cheaper local runtimes.
Evaluation & benchmarks and notable writeups sections round out the index, though the README truncates before showing their full contents.

Caveats

The README is truncated in the source; full contents of later sections (domain-specific adaptations, evaluation, related resources) are not visible.
Star counts are live badges, so the list’s own curation quality is untested — it is a directory, not a reviewed anthology.
No explicit inclusion criteria are stated; “high-signal” is claimed but not defined.

Verdict

Worth bookmarking if you are building or comparing autoresearch tooling and want to avoid reinventing a loop that already has twenty variants. Skip it if you need a single, opinionated framework rather than a map of the territory.