Training prompts like neural nets — without touching a single weight
Microsoft's SkillOpt treats a markdown skill document as the trainable parameter of a frozen LLM agent, complete with epochs, batching, and validation gates.

What it does
SkillOpt optimizes natural-language “skills” for LLM agents using the same machinery as weight-space training — epochs, minibatches, learning-rate budgets, and held-out validation — but everything happens in text. A separate optimizer model proposes bounded add/delete/replace edits to a single skill document; only edits that strictly improve validation scores survive. The result is a compact best_skill.md (300–2,000 tokens) that runs against the unchanged target model with zero extra inference-time calls.
The interesting bit The discipline is the product. Most agent skills are hand-crafted or one-shot generated; SkillOpt makes skill improvement reproducible and measurable, borrowing stability tricks from deep learning (rejected-edit buffers, cosine-decayed textual learning rates, epoch-wise slow/meta updates) that keep optimization from drifting. The paper reports best-or-tied-best results across all 52 evaluated (model, benchmark, harness) cells.
Key highlights
- Supports six benchmarks (SearchQA, ALFWorld, DocVQA, LiveMathematicianBench, SpreadsheetBench, OfficeQA) and multiple backends (OpenAI/Azure, Claude, Qwen via vLLM, MiniMax)
- GPT-5.5 skills lift average no-skill accuracy by +23.5 points (direct chat), +24.8 (Codex CLI), +19.1 (Claude Code)
- Optimized skills transfer across model scales and between execution harnesses without re-optimization
- Training auto-resumes from last completed step; outputs full provenance trail (patches, evals, slow-update logs)
- Pretrained
ckpt/artifacts provided for paper reproduction; PyPI installable (pip install skillopt)
Caveats
- Most benchmark datasets are not included; you bring your own splits in a specific directory format (only SearchQA split is currently bundled)
mainbranch defaults to post-submission force-accept slow-update mode; paper reproduction requires flippingslow_update_gate_with_selection: true- Azure OpenAI endpoint is effectively required for most setups; env var naming is idiosyncratic (
AZURE_OPENAI_*reused even for plain OpenAI endpoints)
Verdict Worth a look if you’re building agent pipelines and tired of prompt engineering by vibe check. Skip it if you need end-to-end data included or aren’t prepared to manage API credentials across multiple backend formats.