Is harbor open source?

Yes — harbor-framework/harbor is open source, released under the Apache-2.0 license.

What language is harbor written in?

harbor-framework/harbor is primarily written in Python.

How popular is harbor?

harbor-framework/harbor has 3.4k stars on GitHub and is currently accelerating.

Where can I find harbor?

harbor-framework/harbor is on GitHub at https://github.com/harbor-framework/harbor.

← all repositories

harbor-framework/harbor

A test harness that treats agents like software under test

The Terminal-Bench team built Harbor because evaluating coding agents at scale means orchestrating thousands of containerized environments, not just running a script.

★3.4k stars Python Agents LLMOps · Eval

View on GitHub ↗ Homepage ↗

Velocity · 7d

+27

★ / day

Trend

↗accelerating

star history

What it does Harbor is a Python framework for running agent evaluations and building RL environments. It wraps third-party benchmarks (SWE-Bench, Aider Polyglot, Terminal-Bench) and arbitrary agents (Claude Code, OpenHands, Codex CLI) in Docker containers, then dispatches them locally or to cloud providers like Daytona and Modal. Results feed back into RL training pipelines as rollout data.

The interesting bit The framework decouples agents, models, datasets, and execution environments so you can mix and match. Want to test Claude Code against Terminal-Bench 2.0 on 100 Daytona instances? One flag change. The same abstraction lets you swap in your own benchmark or agent without rewriting orchestration logic.

Key highlights

Ships with integrations for major coding agents and benchmarks; harbor datasets list shows the full catalog
Scales from local Docker (--n-concurrent 4) to cloud execution (--n-concurrent 100 --env daytona) via CLI flags
Generates rollouts formatted for RL optimization workflows
Official harness for Terminal-Bench 2.0, which originated the project
Companion cookbook repo hosts end-to-end examples

Caveats

README is thin on internals: unclear how environments are isolated, how results are scored, or what the RL data format looks like
Cloud provider integrations require separate API keys and presumably cost money; no pricing guidance given

Verdict Worth a look if you’re benchmarking coding agents or building RL training pipelines around tool-use LLMs. Skip it if you just need a one-off eval script — the overhead only pays off at scale or with repeated experiments.

Frequently asked

What is harbor-framework/harbor?: The Terminal-Bench team built Harbor because evaluating coding agents at scale means orchestrating thousands of containerized environments, not just running a script.
Is harbor open source?: Yes — harbor-framework/harbor is open source, released under the Apache-2.0 license.
What language is harbor written in?: harbor-framework/harbor is primarily written in Python.
How popular is harbor?: harbor-framework/harbor has 3.4k stars on GitHub and is currently accelerating.
Where can I find harbor?: harbor-framework/harbor is on GitHub at https://github.com/harbor-framework/harbor.