Your Coding Agent Needs a Manager, Not More Memory

Contributing Editor

Superpowers imposes software engineering discipline on autonomous coding agents by treating them like eager junior developers who need strict process, not just bigger context windows.

obra/superpowers

★263.3k stars Velocity · 7d +549 ★/day ↘cooling

star history

View on GitHub ↗

The Agentic Hangover

The coding agent space has shifted from autocomplete to autonomy. A year ago, most developers interacted with large language models through chat interfaces or simple autocomplete; now, commercial harnesses like Claude Code, OpenAI’s Codex CLI, Cursor, GitHub Copilot CLI, Gemini CLI, and newer entrants such as OpenCode and Factory Droid promise to decompose tasks, edit files, run tests, and even open pull requests with minimal human intervention. Industry observers distinguish these autonomous agents from traditional assistants by noting that they maintain extended context, set their own sub-goals, and trigger actions across external systems rather than merely responding to isolated prompts. Some configurations can work autonomously for hours at a stretch. But autonomy without structure produces the software equivalent of a sugar rush: agents generate code enthusiastically, ignore edge cases, skip tests, and wander off-spec into over-engineered solutions. Jesse Vincent’s Superpowers arrives as a hangover cure. It does not add new models, memory systems, or orchestration servers. Instead, it treats the agent as an overeager junior engineer—one with poor taste, no project context, and an aversion to testing—who needs relentless process before touching a single file.

Methodology, Not Framework

This distinction matters. The current agentic landscape is crowded with infrastructure. Dozens of open-source frameworks have emerged beyond the widely known CrewAI and AutoGen, including LangGraph, Agno, SmolAgents, Mastra, and Pydantic AI. IBM’s guidance on framework selection notes that organizations typically weigh task complexity, data privacy, and integration depth, with some frameworks offering low-level control for advanced developers while others provide no-code interfaces for rapid prototyping. AWS documentation categorizes these platforms as reducing “undifferentiated heavy lifting” across communication protocols and tool integrations, with emerging standards like the Model Context Protocol and Agent2Agent defining how agents converse with external systems and each other. Superpowers steps outside this arms race entirely. It is not an orchestration framework, a managed platform, or a protocol. It is a software development lifecycle compressed into instructions and skill modules. Where conventional frameworks worry about state machines, agent-to-agent messaging, and memory management, Superpowers worries about whether the agent wrote the failing test before the implementation. Its core insight is that the bottleneck in agentic coding is not tool access or memory capacity, but engineering discipline. One taxonomy of agent maturity describes progression from simple tool-wielding agents to full agentic systems, noting that many current prototypes are merely “fancy prompt chains” lacking genuine state or reasoning. Superpowers accepts this reality and addresses it by making the prompt chain explicit, structured, and human-gated.

The Assembly Line

The workflow is rigid by design, and it spans the entire lifecycle of a feature. When a user describes a project, the agent does not immediately generate code. A brainstorming skill forces Socratic questioning to refine the spec, explores alternatives, and presents the design in digestible chunks for human validation. Only after sign-off does a planning skill break work into bite-sized tasks—each estimated at two to five minutes—with exact file paths, complete code snippets, and verification steps. A git worktrees skill creates an isolated workspace on a new branch and verifies a clean test baseline before any implementation begins.

Execution happens through subagent-driven development: fresh subagents tackle individual tasks, subject to two-stage review that first checks spec compliance and then examines code quality. A mandatory test-driven-development skill enforces red-green-refactor cycles, and the system explicitly deletes any code written before tests. Between tasks, a code-review skill inspects the work against the plan, reports issues by severity, and blocks progress on critical problems. A systematic-debugging skill provides a four-phase root-cause process when things go wrong, paired with a verification-before-completion requirement that prevents the agent from declaring victory prematurely. There are even skills for receiving external code review and responding to feedback, treating the agent not as a black box but as a participant in a human team. For larger efforts, parallel agent dispatch allows concurrent workflows. Finally, a branch-finishing skill verifies tests, presents options to merge, open a pull request, keep, or discard the branch, and cleans up the worktree. The agent checks for relevant skills before any task. These are not suggestions; they are mandatory workflows.

This architecture acknowledges a truth that broader industry surveys often gloss over. Autonomous agents can plan multi-step workflows and use external tools, but without enforced structure they rapidly accumulate technical debt. The README wisely makes no benchmark claims of its own, but the implied bet is that structured process will outperform raw generation.

Every Harness, One Opinion

Superpowers’ distribution strategy is as unusual as its technical approach. It is available as a plugin across nearly every major coding harness: Claude Code via Anthropic’s official marketplace, Codex CLI and App through OpenAI’s marketplace, Cursor, GitHub Copilot CLI, Gemini CLI, OpenCode, and Factory Droid. It registers in official marketplaces and project-specific stores without environmental reconfiguration; once installed, the skills trigger automatically. This harness-agnosticism positions it as a methodology layer floating above the platform wars. Whether the underlying agent is Claude, GPT-4o, or Gemini, the same behavioral constraints apply. The practical effect is portability; the methodology travels with the user across IDEs and command-line tools rather than locking into a single vendor’s ecosystem. The project is maintained by Jesse Vincent and Prime Radiant under an MIT license, with a Discord community and a mailing list for release announcements, suggesting it is becoming a small ecosystem of its own.

The Limits of Prompt Discipline

For all its structure, Superpowers is fundamentally a prompt-engineering and instruction-set project. The documentation describes the system as “initial instructions” and composable skills. That is not a weakness—at this stage of agentic AI, well-curated process may matter more than novel infrastructure—but it does impose limits. The project maintains tight control over its skill library, stating that it “doesn’t generally accept contributions of new skills” and requiring that any updates work across all supported harnesses. Contributors must fork the repository, work from a dev branch, and follow the project’s own meta-level writing-skills skill when proposing changes. This recursive structure—using Superpowers to write Superpowers—is elegant, yet the tight control suggests a maintenance burden that scales with the number of platforms and a single-minded, almost authoritarian vision of how agents should behave.

The metaphor of the enthusiastic junior engineer is charming, but it also admits that current agents are not yet competent independent developers. They require micromanagement. The system promises that agents can work autonomously for a couple of hours at a time, but that autonomy is only possible because the plan is so tightly constrained and the review gates so frequent. If the underlying models improve their taste and judgment, the leash might need loosening. Until then, Superpowers functions as a necessary corrective.

Outlook

As enterprise interest in agentic AI grows, and as protocols like MCP and A2A standardize how agents talk to tools and each other, the missing layer may be exactly what Superpowers provides: not more protocols, but better manners. The project even operates its own Superpowers Marketplace alongside official channels like Anthropic’s and OpenAI’s stores, hinting at an ecosystem play. Yet the open question remains whether independent methodology packs like this will remain necessary, or if harness makers will simply bake similar workflows into their system prompts. For now, Superpowers offers a rare commodity in the agentic gold rush: restraint. It proves that the most valuable addition to an autonomous coding agent might not be another tool or a larger context window, but a manager standing behind it, insisting on the spec, the test, and the code review.