fivetaku/fablize

A Claude harness built on tests that proved its own limits

It forces Claude to verify before it claims victory, grounded in a 19-run A/B test that separated transferable procedure from fixed capability.

★504 stars Python Coding Assistants LLMOps · Eval

View on GitHub ↗

Collecting fresh signals — velocity needs a few days of history.

collecting data…

star history

What it does fablize is a Claude Code plugin that intercepts tasks and enforces verified workflows: renderable artifacts get executed before completion, multi-step work is decomposed with checkpointed evidence gates, and debugging follows a reproduce-hypothesize-trace protocol. It also blocks the model from declaring intent without action—an early-stop hook that catches “I’ll do X” without the doing. The plugin routes each task to only the discipline that proved effective in controlled testing.

The interesting bit The author ran 19 A/B runs plus 26 real sessions comparing Fable 5 against Opus 4.8, then actually tried injecting Fable-like defects into Opus to see if a harness could make it “discover” them. When the injection failed, they accepted the result and left those capabilities out. Most agent harnesses promise to make the model smarter; this one explicitly ships a list of what it cannot transfer.

Key highlights

Evidence-based scope: only four procedures (verification grounding, multi-story gates, investigation protocol, early-stop hook) made the cut; style mimicry and broad reasoning injections were shelved as unverified.
Per-task routing: a UserPromptSubmit hook injects only the matching discipline—no always-on bloat.
Honest escalation: when a task hits non-transferable territory (open-ended creative depth, out-of-spec defect discovery), the plugin tells you to switch models or call a human instead of hallucinating progress.
Deterministic guardrails: the early-stop hook and verification gates are procedural, not persuasive—they don’t ask nicely, they block completion.

Caveats

The effect claims rest on a small, single-family (Claude-only) self-measurement; the README itself warns that “the direction is solid; the decimals are not asserted.”
The early-stop hook can misfire on declarative offers phrased as statements (“I’ll write the report if you want”), so polite questions are currently the workaround.
It cannot raise the model’s ceiling on open-ended or creative work—if the model doesn’t see the implication, the harness won’t conjure it.

Verdict Worth a look if you treat Claude Code as a production assistant and want fewer “done” messages that mean “I think I did it.” Skip it if you are hoping to turn Opus into Fable 5; the README is admirably clear that some gaps are model, not middleware.