98 cheat sheets to turn your coding agent into an AI researcher
A structured prompt library that teaches Claude Code, Codex, or Gemini how to run the full ML research lifecycle — from literature review to LaTeX.

What it does
AI Research Skills is a collection of 98 markdown “skill” files that you install into coding agents like Claude Code, Cursor, or Gemini CLI. Each file documents a specific framework or workflow — from vLLM inference to GRPO post-training to LaTeX paper writing — so the agent can reference expert-level guidance without you typing it. An npx installer auto-detects your agent, symlinks the skills, and optionally loads an “autoresearch” orchestration layer that claims to manage the full research lifecycle via a two-loop architecture.
The interesting bit
The project treats prompt engineering as infrastructure. Rather than hoping your agent remembers how Megatron-LM’s 4D parallelism works, you give it a 462-line structured reference with code examples, troubleshooting, and citations. The “autoresearch” skill then acts as a router, delegating to domain skills as needed — essentially building a primitive operating system for agent-driven research out of markdown and symlinks.
Key highlights
- 98 skills across 23 categories: model architecture (LitGPT, Mamba, TorchTitan), fine-tuning (Axolotl, Unsloth, PEFT), post-training (TRL, GRPO, OpenRLHF, verl), inference (vLLM, SGLang, TensorRT-LLM), mechanistic interpretability (TransformerLens, nnsight), and more
- One-command install via
npx @orchestra-research/ai-research-skillswith auto-detection of Claude Code, Cursor, Gemini CLI, Hermes Agent, and OpenCode - Alternative Claude Code marketplace integration:
/plugin install fine-tuning@ai-research-skills - Skills include line counts and reference counts (e.g., GRPO-RL-Training: 569 lines, “gold standard”; HuggingFace Tokenizers: 486 lines + 4 refs)
- Also covers “soft” research tasks: ideation, literature survey, ML paper writing with LaTeX templates, citation verification, and academic plotting
Caveats
- The README is heavy on category tables and light on how the autoresearch orchestration actually works in practice — the two-loop architecture is described but not demonstrated
- No visible benchmarks or evaluations of whether agents perform better with these skills versus raw prompting
- Several skills reference frameworks with version-specific details; maintenance burden for 98 skills against rapidly moving targets (vLLM, TRL, etc.) is unclear
Verdict
Worth a look if you’re already using Claude Code or similar agents for ML work and tired of re-explaining FSDP or GRPO in every session. Skip it if you want a Python library — this is documentation-as-infrastructure, not code.