← all repositories

stanford-iris-lab/meta-harness-tbench2-artifact

Agent scaffold for AI benchmarking on Terminal-Bench 2.0 that injects sandbox environment snapshots into prompts.

1.1k stars Python AgentsLLMOps · Eval
meta-harness-tbench2-artifact
Velocity · 7d
+15
★ / day
Trend
steady
star history

Meta-Harness is an agent development scaffold built on Terminus-KIRA and Harbor’s Terminus-2 framework, designed for evaluating AI agents on terminal task benchmarks. It improves agent performance by pre-injecting environment context (working directory, tools, packages) into the initial prompt, eliminating early exploration turns. The system achieves 76.4% accuracy across 89 tasks using Claude Opus 4.6, with the optimal agent configuration discovered through automated harness evolution.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.