stanford-iris-lab/meta-harness-tbench2-artifact
Agent scaffold for AI benchmarking on Terminal-Bench 2.0 that injects sandbox environment snapshots into prompts.

Meta-Harness is an agent development scaffold built on Terminus-KIRA and Harbor’s Terminus-2 framework, designed for evaluating AI agents on terminal task benchmarks. It improves agent performance by pre-injecting environment context (working directory, tools, packages) into the initial prompt, eliminating early exploration turns. The system achieves 76.4% accuracy across 89 tasks using Claude Opus 4.6, with the optimal agent configuration discovered through automated harness evolution.