A bouncer for your LLM: prompt injection defense that fits in 0.6B parameters
Superagent is an open-source SDK that embeds guardrails, PII redaction, and repo scanning directly into AI applications.

What it does
Superagent wraps four safety operations around your AI agent: guard blocks prompt injections and malicious tool calls at runtime; redact strips PII, PHI, and secrets from text; scan audits GitHub repos for agent-targeted attacks like repo poisoning; and test (marked “coming soon”) promises red-team scenarios against live endpoints. It ships as TypeScript and Python SDKs, plus a CLI and an MCP server for Claude Code.
The interesting bit The project offers its own open-weight guard models—0.6B, 1.7B, and 4B parameters, with GGUF builds for CPU—so you can run classification entirely on-premise with claimed 50–100ms latency. That’s the rare combination of “no data leaves your network” and “fast enough for real-time chat.”
Key highlights
- Runtime guard against prompt injection, malicious instructions, and unsafe tool calls
- Automatic PII/PHI/secrets redaction via configurable models (examples use
openai/gpt-4o-mini) - Repository scanning for AI-specific supply-chain threats
- Self-hostable guard models at three size tiers; MIT licensed
- MCP server integration for Claude Code/Desktop workflows
Caveats
- The
test(red team) feature is explicitly marked “Coming soon” with no timeline - The SDK requires an API key from superagent.sh for cloud usage; self-hosting is only documented for the guard models, not redact or scan
- Package name (
safety-agent) differs from repo name (superagent), which may confuse imports
Verdict Worth evaluating if you’re shipping customer-facing LLM features and need a compliance story without building guardrails from scratch. Skip if you need fully offline redaction or scanning today—the open-weight offering only covers the guard classifier.