The discipline of babysitting AI agents
A curated field guide to the scaffolding that keeps agents from wandering off-task, leaking context, or deleting production.

What it does This is an awesome-list that catalogs the emerging discipline of “harness engineering” — the tooling, patterns, and reference architectures that wrap around AI agents to make them reliable. It collects canonical essays from OpenAI, Anthropic, Google, Meta, and Microsoft alongside academic papers and practitioner writeups, organized into categories like context delivery, memory systems, permission frameworks, observability, and orchestration.
The interesting bit The list treats the harness as temporary scaffolding with an expiration date. Its core thesis, drawn from Anthropic and OpenAI sources: every component exists because the current model can’t do something alone, and the best harnesses are designed knowing those components will become unnecessary. That’s a refreshingly honest framing in a field that usually pretends its abstractions are eternal.
Key highlights
- Foundational canon: Curated primary sources from OpenAI’s Codex harness breakdown, Anthropic’s agent architecture guides, Martin Fowler’s synthesis, and LangChain’s anatomy of a harness
- Production war stories: Microsoft’s Azure SRE agent (35,000+ incidents handled, time-to-mitigation dropped from 40.5 hours to 3 minutes), Meta’s multi-day ML pipeline harness with hibernate-and-wake checkpointing
- Concrete patterns: Filesystem-based context engineering, schema-filtered planning subagents, eager-construction scaffolding, natural-language agent harnesses (NLAHs)
- Tooling categories: MCP integration, eval frameworks, sandbox design, human-in-the-loop governance, context compaction against “context rot”
- Templates and starter harnesses: Includes demo implementations and meta-harness generators for bootstrapping
Caveats
- As with any awesome-list, curation quality varies; some entries are blog posts with unverified claims
- Several source links point to 2026-dated articles that may not be publicly accessible or may have changed
- The list is English-centric despite translated versions being linked via zdoc.app
Verdict Worth bookmarking if you’re building or operating agent systems in production and need a structured map of what the major labs are actually doing versus what they’re blogging about. Less useful if you’re looking for runnable code — this is a reading list with occasional templates, not a framework.