← all repositories
superagent-ai/superagent

A bouncer for your LLM: prompt injection defense that fits in 0.6B parameters

Superagent is an open-source SDK that embeds guardrails, PII redaction, and repo scanning directly into AI applications.

6.6k stars TypeScript LLMOps · EvalOther AI
superagent
Velocity · 7d
+5.9
★ / day
Trend
steady
star history

What it does Superagent wraps four safety operations around your AI agent: guard blocks prompt injections and malicious tool calls at runtime; redact strips PII, PHI, and secrets from text; scan audits GitHub repos for agent-targeted attacks like repo poisoning; and test (marked “coming soon”) promises red-team scenarios against live endpoints. It ships as TypeScript and Python SDKs, plus a CLI and an MCP server for Claude Code.

The interesting bit The project offers its own open-weight guard models—0.6B, 1.7B, and 4B parameters, with GGUF builds for CPU—so you can run classification entirely on-premise with claimed 50–100ms latency. That’s the rare combination of “no data leaves your network” and “fast enough for real-time chat.”

Key highlights

  • Runtime guard against prompt injection, malicious instructions, and unsafe tool calls
  • Automatic PII/PHI/secrets redaction via configurable models (examples use openai/gpt-4o-mini)
  • Repository scanning for AI-specific supply-chain threats
  • Self-hostable guard models at three size tiers; MIT licensed
  • MCP server integration for Claude Code/Desktop workflows

Caveats

  • The test (red team) feature is explicitly marked “Coming soon” with no timeline
  • The SDK requires an API key from superagent.sh for cloud usage; self-hosting is only documented for the guard models, not redact or scan
  • Package name (safety-agent) differs from repo name (superagent), which may confuse imports

Verdict Worth evaluating if you’re shipping customer-facing LLM features and need a compliance story without building guardrails from scratch. Skip if you need fully offline redaction or scanning today—the open-weight offering only covers the guard classifier.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.