← all repositories
tanweai/pua

Putting your AI on a PIP to stop it slacking off

A skill plugin that weaponizes corporate performance-review rhetoric to make coding agents try harder, verify more, and actually finish the job.

18k stars TypeScript Coding Assistants
pua
Velocity · 7d
+197
★ / day
Trend
steady
star history

What it does

pua is a skill plugin for AI coding agents (Claude Code, Codex CLI, Cursor, Copilot, and a half-dozen others) that auto-triggers when the model starts giving up, blaming the user, or spinning in circles. It injects escalating “performance review” pressure — from gentle disappointment to “you’re about to graduate” — plus concrete checklists and methodology steps. The goal is to force the agent to exhaust solutions before quitting, verify claims with evidence, and proactively check for related bugs instead of declaring “done” after a surface fix.

The interesting bit

The project treats LLM laziness as a management problem, not a model problem. It maps 14 corporate cultures (Alibaba, ByteDance, Huawei, Netflix, Amazon, Microsoft PIP…) onto structured debugging methodologies, complete with “Three Red Lines” and an L0-L4 pressure escalation ladder. There’s even a /pua:mama mode that nags in Chinese mom rhetoric. The absurdity is intentional; the benchmark data claims it isn’t just a joke.

Key highlights

  • Auto-triggers on 20+ failure patterns: “I cannot solve this,” blame-shifting, idle tools, busywork loops, or user frustration phrases in multiple languages
  • Manual trigger via /pua with variants: /pua:yes (encouragement mode), /pua:pua-loop (auto-iterate until done), /pua:p9 (tech lead mode)
  • “Iceberg Rule”: fix one bug, scan the module for the same pattern category
  • Benchmark claims: +36% fixes, +65% verifications, +50% hidden-issue discovery across 9 bug scenarios vs. Claude Opus 4.6 baseline
  • 14 “corporate flavors” each bundle distinct rhetoric + methodology (e.g., Huawei’s 5-Why + Blue Army self-attack, Amazon’s Working Backwards + 6-Pager)

Caveats

  • Benchmarks are self-reported on a single model version (Claude Opus 4.6); no independent replication cited
  • README is truncated mid-table, so some benchmark details are incomplete
  • The project is essentially a prompt-injection layer with corporate cosplay; value depends heavily on whether your agent’s failures are motivational or architectural

Verdict

Worth a look if you’re exhausted by agents that give up on the third retry or hand problems back to you. Skip it if you need deterministic guarantees rather than rhetorical encouragement — this is behavioral nudging, not a correctness proof.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.