← all repositories
Orchestra-Research/AI-Research-SKILLs

98 cheat sheets to turn your coding agent into an AI researcher

A structured prompt library that teaches Claude Code, Codex, or Gemini how to run the full ML research lifecycle — from literature review to LaTeX.

AI-Research-SKILLs
Velocity · 7d
+43
★ / day
Trend
steady
star history

What it does

AI Research Skills is a collection of 98 markdown “skill” files that you install into coding agents like Claude Code, Cursor, or Gemini CLI. Each file documents a specific framework or workflow — from vLLM inference to GRPO post-training to LaTeX paper writing — so the agent can reference expert-level guidance without you typing it. An npx installer auto-detects your agent, symlinks the skills, and optionally loads an “autoresearch” orchestration layer that claims to manage the full research lifecycle via a two-loop architecture.

The interesting bit

The project treats prompt engineering as infrastructure. Rather than hoping your agent remembers how Megatron-LM’s 4D parallelism works, you give it a 462-line structured reference with code examples, troubleshooting, and citations. The “autoresearch” skill then acts as a router, delegating to domain skills as needed — essentially building a primitive operating system for agent-driven research out of markdown and symlinks.

Key highlights

  • 98 skills across 23 categories: model architecture (LitGPT, Mamba, TorchTitan), fine-tuning (Axolotl, Unsloth, PEFT), post-training (TRL, GRPO, OpenRLHF, verl), inference (vLLM, SGLang, TensorRT-LLM), mechanistic interpretability (TransformerLens, nnsight), and more
  • One-command install via npx @orchestra-research/ai-research-skills with auto-detection of Claude Code, Cursor, Gemini CLI, Hermes Agent, and OpenCode
  • Alternative Claude Code marketplace integration: /plugin install fine-tuning@ai-research-skills
  • Skills include line counts and reference counts (e.g., GRPO-RL-Training: 569 lines, “gold standard”; HuggingFace Tokenizers: 486 lines + 4 refs)
  • Also covers “soft” research tasks: ideation, literature survey, ML paper writing with LaTeX templates, citation verification, and academic plotting

Caveats

  • The README is heavy on category tables and light on how the autoresearch orchestration actually works in practice — the two-loop architecture is described but not demonstrated
  • No visible benchmarks or evaluations of whether agents perform better with these skills versus raw prompting
  • Several skills reference frameworks with version-specific details; maintenance burden for 98 skills against rapidly moving targets (vLLM, TRL, etc.) is unclear

Verdict

Worth a look if you’re already using Claude Code or similar agents for ML work and tired of re-explaining FSDP or GRPO in every session. Skip it if you want a Python library — this is documentation-as-infrastructure, not code.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.