Is forge open source?

Yes — antoinezambelli/forge is open source, released under the MIT license.

What language is forge written in?

antoinezambelli/forge is primarily written in Python.

How popular is forge?

antoinezambelli/forge has 2.2k stars on GitHub.

Where can I find forge?

antoinezambelli/forge is on GitHub at https://github.com/antoinezambelli/forge.

← all repositories

antoinezambelli/forge

Teaching 8B models to stop hallucinating tool calls

A reliability layer that makes self-hosted LLMs actually usable for agentic workflows without rewriting your existing tools.

★2.2k stars Python Agents LLMOps · Eval Inference · Serving

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

Forge is a Python guardrail layer for self-hosted LLM tool-calling. You define tools; the model calls them in whatever order it wants. Forge validates those calls, rescues malformed output, retries with corrective nudges, and optionally enforces step ordering through required_steps and prerequisites. It sits between your model and your client — or inside your own loop — and cleans up the mess small models make when they try to act like agents.

The interesting bit

The proxy mode is the sneaky entry point. Forge speaks both OpenAI and Anthropic API shapes, so you point Claude Code or your existing coding harness at localhost:8081 and suddenly your flaky local 8B model looks reliable. Forge injects a synthetic respond tool to stop small models from emitting bare text when they should be calling tools, then strips it from the response so the client never knows. The README claims this lifts an 8B model from “single digits” to 84% on their 26-scenario eval suite — and even bumps Sonnet 4.6 from 85% to 98%. (The Anthropic numbers are from v0.6.0; the author notes re-running them is “non-trivial,” which is a polite way of saying expensive.)

Key highlights

Drop-in proxy for OpenAI chat-completions and Anthropic Messages APIs — no client rewrite
Rescue parsing for Mistral [TOOL_CALLS], Qwen <tool_call> XML, and fenced JSON — models that can’t format correctly get a second chance
Three integration depths: proxy server, WorkflowRunner with full lifecycle management, or composable middleware for your own loop
SlotWorker for priority-queued GPU sharing across multiple agent workflows
Supports Ollama, llama.cpp/llama-server, Llamafile, vLLM, and Anthropic backends

Caveats

Proxy mode is single-shot per request: multi-turn features like prerequisite enforcement and context compaction require WorkflowRunner
The 8B-to-84% claim is on Forge’s own eval suite; your tools may vary
Python 3.12+ only, and you still need to manage the LLM backend yourself in external mode

Verdict

Worth a look if you’re running local models for agentic tasks and tired of debugging why your 8B parameter assistant just tried to call a function named undefined. Skip it if you’re already on OpenAI/Anthropic APIs with reliable tool-calling, or if you need true multi-agent orchestration — Forge explicitly stays inside one agentic loop.

Frequently asked

What is antoinezambelli/forge?: A reliability layer that makes self-hosted LLMs actually usable for agentic workflows without rewriting your existing tools.
Is forge open source?: Yes — antoinezambelli/forge is open source, released under the MIT license.
What language is forge written in?: antoinezambelli/forge is primarily written in Python.
How popular is forge?: antoinezambelli/forge has 2.2k stars on GitHub.
Where can I find forge?: antoinezambelli/forge is on GitHub at https://github.com/antoinezambelli/forge.