← all repositories
kucherenko/jscpd

Copy-paste detective that speaks fluent LLM

A 5.7k-star duplication detector rebuilt itself for the agentic era: token-efficient reporters, MCP server, and skills your AI assistant can actually use.

5.7k stars TypeScript Coding AssistantsData Tooling
jscpd
Velocity · 7d
+1.2
★ / day
Trend
steady
star history

What it does jscpd hunts down duplicated code across 223 programming languages and document formats using the Rabin-Karp algorithm. Run it as a CLI tool, embed it via TypeScript API, or spin up a local server to check snippets over HTTP.

The interesting bit The project didn’t just slap an “AI-ready” sticker on the box. It built three distinct integration paths: an ai reporter that compresses output to ~1,100 tokens (79% fewer than the default console reporter), installable agent skills that teach assistants how to invoke jscpd and refactor what it finds, and a full MCP server so Claude Desktop et al. can call check_duplication as a native tool. The v4.2.x release also replaced prismjs with a custom reprism-based tokenizer, yielding an 11.5% speedup and enabling cross-format detection — a <script> block in a .vue file can now match a plain .ts file.

Key highlights

  • Supports 223 formats, up from 152 in recent releases, including shebang detection for extensionless scripts
  • Monorepo architecture: core algorithm, finder, tokenizer, and reporters are separately installable packages
  • Multiple output formats: console, HTML, badge, SARIF (GitHub Code Scanning compatible), and the token-efficient ai reporter
  • LevelDB-backed store option for large repositories, plus a persistent memory store for incremental scans
  • Used by GitHub Super Linter, Mega-Linter, Codacy, and Code-Inspector

Caveats

  • The MCP server and AI skills are relatively new; the README notes them but doesn’t show real-world agent integration examples beyond config snippets
  • LevelDB store is explicitly marked “slower than default store” — the trade-off for handling bigger repos is performance
  • Recent bug fixes reveal the codebase has had edge-case issues: entire-file duplicates were silently dropped until #728, and a ReDoS vulnerability in Lisp tokenization required a regex rewrite

Verdict Worth a look if you’re running a polyglot codebase or wiring duplication checks into an AI-assisted workflow. Skip if you only need basic clone detection in a single language — simpler tools will do without the cognitive overhead of MCP configs and skill installation.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.