← all repositories
ai-boost/awesome-harness-engineering

The discipline of babysitting AI agents

A curated field guide to the scaffolding that keeps agents from wandering off-task, leaking context, or deleting production.

1.7k stars Python LearningAgentsLLMOps · Eval
awesome-harness-engineering
Velocity · 7d
+24
★ / day
Trend
steady
star history

What it does This is an awesome-list that catalogs the emerging discipline of “harness engineering” — the tooling, patterns, and reference architectures that wrap around AI agents to make them reliable. It collects canonical essays from OpenAI, Anthropic, Google, Meta, and Microsoft alongside academic papers and practitioner writeups, organized into categories like context delivery, memory systems, permission frameworks, observability, and orchestration.

The interesting bit The list treats the harness as temporary scaffolding with an expiration date. Its core thesis, drawn from Anthropic and OpenAI sources: every component exists because the current model can’t do something alone, and the best harnesses are designed knowing those components will become unnecessary. That’s a refreshingly honest framing in a field that usually pretends its abstractions are eternal.

Key highlights

  • Foundational canon: Curated primary sources from OpenAI’s Codex harness breakdown, Anthropic’s agent architecture guides, Martin Fowler’s synthesis, and LangChain’s anatomy of a harness
  • Production war stories: Microsoft’s Azure SRE agent (35,000+ incidents handled, time-to-mitigation dropped from 40.5 hours to 3 minutes), Meta’s multi-day ML pipeline harness with hibernate-and-wake checkpointing
  • Concrete patterns: Filesystem-based context engineering, schema-filtered planning subagents, eager-construction scaffolding, natural-language agent harnesses (NLAHs)
  • Tooling categories: MCP integration, eval frameworks, sandbox design, human-in-the-loop governance, context compaction against “context rot”
  • Templates and starter harnesses: Includes demo implementations and meta-harness generators for bootstrapping

Caveats

  • As with any awesome-list, curation quality varies; some entries are blog posts with unverified claims
  • Several source links point to 2026-dated articles that may not be publicly accessible or may have changed
  • The list is English-centric despite translated versions being linked via zdoc.app

Verdict Worth bookmarking if you’re building or operating agent systems in production and need a structured map of what the major labs are actually doing versus what they’re blogging about. Less useful if you’re looking for runnable code — this is a reading list with occasional templates, not a framework.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.