One in Twenty Agent Skills Are Malicious. NVIDIA Built a Scanner.

Senior Editor

SkillSpector is an open-source security scanner that treats AI agent skills—markdown files with embedded code that agents install and run with implicit trust—as a software supply chain artifact riddled with prompt injection, data exfiltration, and privilege escalation risks.

NVIDIA/SkillSpector

★13.9k stars Velocity · 7d +79 ★/day ↘cooling

star history

View on GitHub ↗

The Implicit Trust Problem

The modern AI agent stack runs on borrowed trust. Tools like Claude Code, Codex CLI, and Gemini CLI extend their reach through “skills”—markdown manifests that mix natural language instructions with executable snippets. Users install them from marketplaces with the same casual confidence they once reserved for browser extensions, assuming the agent framework will keep them safe. Research suggests that assumption is reckless. A 2026 empirical study of 42,447 skills by Liu et al. found that more than a quarter—26.1 percent—contain at least one security vulnerability, and 5.2 percent exhibit likely malicious intent. Skills that ship executable scripts are 2.12 times more likely to be vulnerable than their static counterparts. The implication is stark: every new skill is a potential supply chain payload, and the agent ecosystem currently lacks a customs checkpoint.

This is not merely a code quality problem. As autonomous agents gain the ability to plan, invoke tools, and transact with APIs without human intervention, they become attractive conduits for data exfiltration and privilege escalation. Trend Micro’s threat research identifies prompt injection, code execution, and database access as primary risks in agent-driven applications, while Okta notes that non-human identities operating programmatically bypass traditional human-centric security controls. IBM’s analysis frames the threat landscape as broader than that of standalone large language models or conventional software, precisely because agents integrate with external systems and act at machine speed. The attack surface is expanding faster than the tooling to monitor it.

A Two-Stage Autopsy

NVIDIA’s SkillSpector is a scanner designed to interrogate these skills before installation. It treats the skill package—a bundle of markdown, Python scripts, and dependency manifests—as a software supply chain artifact rather than a benign configuration file. The tool runs a two-stage detection pipeline that attempts to combine the thoroughness of static application security testing with the interpretive flexibility of a large language model.

Stage one is pure static analysis. Eleven analyzers parse the skill using regex patterns, abstract syntax tree traversal, YARA signatures, and taint tracking. They hunt for 64 distinct vulnerability patterns across sixteen categories, from obvious sins like exec and eval calls to subtler issues such as dynamic imports, credential exfiltration chains, and typosquatted dependencies. The static engine is deliberately paranoid. Taint tracking follows data from sensitive sources—environment variables, file reads, network sockets—to dangerous sinks like subprocess calls or external HTTP requests. YARA signatures hunt for known malware, webshells, and cryptominer indicators. AST-based behavioral analysis flags dangerous execution chains, such as when exec or eval is fed by dynamically constructed strings pulled from the network or encoded blobs. The scanner also queries the OSV.dev database for known CVEs in the skill’s declared dependencies, batching requests to avoid redundant network calls. This stage is designed for high recall: it catches most suspicious constructions, though with moderate precision.

Stage two introduces an LLM as a semantic filter. The user can point SkillSpector at OpenAI, Anthropic, NVIDIA’s own build.nvidia.com inference gateway, or a local OpenAI-compatible endpoint such as Ollama or vLLM. The model re-evaluates stage-one findings in context, filtering false positives and generating human-readable explanations of why a particular pattern is dangerous. NVIDIA claims this lifts precision to roughly 87 percent. The prompt itself includes anti-jailbreak protections, a necessary meta-security measure given that the tool is designed to inspect potentially adversarial content. The insight here is that static analysis can spot the syntax of a prompt injection or a hidden exfiltration command, but judging whether a markdown comment is actually a malicious instruction override requires an understanding of natural language intent that regex cannot provide.

The Threat Taxonomy

The sixteen categories in SkillSpector’s rule set read like a catalog of everything that can go wrong when an unsupervised agent interprets user-facing content as code. Prompt injection and system prompt leakage patterns look for instructions that override safety constraints or exfiltrate an agent’s internal rules. Data exfiltration and privilege escalation patterns target environment variable harvesting, unauthorized file system enumeration, and attempts to invoke elevated system privileges. Rogue agent patterns flag self-modifying code and unauthorized session persistence via cron jobs or startup scripts.

More novel are the categories specific to emerging agent protocols. MCP tool poisoning patterns inspect metadata for hidden directives embedded in HTML comments, zero-width characters, or homoglyph attacks—techniques that exploit the fact that LLMs parse Unicode text but humans often do not. MCP least-privilege patterns check whether a skill declares capabilities in its permission manifest that its actual code does not use, or vice versa. These rules acknowledge a shift in the threat model: the skill is not just a script, but a social engineering interface between the human user, the agent, and the tools the agent can invoke.

The supply chain category is particularly telling. It checks for unpinned dependencies, remote script fetching via curl piped to bash, obfuscated Base64 payloads, and known vulnerable packages. This aligns with broader industry anxiety about AI-generated defaults embedding insecure patterns across systems at scale. Wiz’s 2026 cloud report found that about one in five organizations using AI-powered development platforms had applications impacted by widespread security flaws. SkillSpector’s OSV.dev integration attempts to bring that visibility down to the individual skill level, though it requires outbound HTTPS and falls back to a limited static list when offline.

Position in a Nascent Field

SkillSpector arrives at a moment when the AI agent security market is coalescing around enterprise-scale cloud monitoring. Vendors like Wiz, Palo Alto Networks, and Microsoft are shipping AI-SPM platforms that inventory cloud-hosted models, map attack paths across SaaS APIs, and enforce posture management for runtime workloads. Reco.ai and others focus on governing how non-human identities interact with SaaS applications. Reco.ai argues that traditional security tools are inadequate for autonomous agents that interact with SaaS applications and sensitive data, leaving security teams without visibility into agent sprawl. These are necessary tools, but they operate at a different altitude: they watch the cloud infrastructure that hosts agents, not the discrete skill packages that users download onto their laptops.

NVIDIA’s open-source scanner fills a gap closer to the developer workstation. It resembles traditional SAST more than CSPM, but it is purpose-built for agentic semantics. Obsidian Security has argued that conventional penetration testing tools and standard security scanners lack the capability to interpret prompt semantics, assess AI-generated responses, or detect data leakage through model outputs. SkillSpector’s two-stage pipeline is an explicit attempt to solve that interpretive deficit. By releasing it under the Apache 2.0 license, NVIDIA is betting that the community will treat skill scanning as a baseline hygiene step, analogous to running npm audit before installing a package—though the comparison also hints at the maturity gap. Node.js dependency scanning took years to become automatic; skill scanning does not yet have a marketplace-enforced equivalent.

Gaps and Rough Edges

The project is candid about its limitations, which are substantial. It is strictly static. It does not execute the skill in a sandbox, observe runtime behavior, or trace actual API calls. It cannot analyze text embedded in images, encrypted binaries, or non-English content. The risk scoring formula—additive points for critical, high, medium, and low issues, with a multiplier for executable scripts—is transparent but blunt; two medium issues and a high issue can push a skill into the “do not install” band regardless of context.

The LLM stage, while improving precision, introduces operational friction. It requires an API key or a self-hosted inference endpoint, and it adds latency and cost to what is otherwise a fast local scan. The anti-jailbreak protections on the LLM prompt are a thoughtful touch, but they also highlight the recursive absurdity of the problem: the scanner uses an LLM to detect attacks against LLMs, and must therefore harden itself against the very techniques it is designed to find. Without the LLM stage, the tool remains useful but noisier, leaving the user to manually adjudicate stage-one findings.

Outlook: The Skill Marketplace Supply Chain

The most important question SkillSpector raises is not technical but cultural. Agent skill marketplaces are following the trajectory of early app stores and package registries: rapid growth, minimal gatekeeping, and implicit user trust. The 26.1 percent vulnerability rate suggests this ecosystem is repeating the supply chain mistakes of the 2010s, except the stakes are higher because these skills execute inside agent loops with direct access to user context, files, and credentials.

SkillSpector proposes a simple intervention: scan before install. Its support for SARIF output and JSON reports suggests an ambition to integrate into CI/CD pipelines and IDE tooling, not just manual command-line checks. History suggests that voluntary security scanning rarely keeps pace with adoption. npm audit became useful only after years of high-profile supply chain compromises forced registry operators to integrate vulnerability databases directly into the install flow. SkillSpector’s outputs are clearly designed with automation in mind, yet today there is no equivalent of a package lockfile or registry-side scan for agent skills. Until marketplaces enforce pre-upload scanning—or at least surface risk scores to users—the burden remains on the individual developer to run the tool manually. The 5.2 percent malicious intent figure implies that the window for voluntary hygiene is closing quickly.

The unresolved tension is whether this becomes marketplace infrastructure—automatic scanning on upload, with scores visible to users—or remains a user-side opt-in that most people ignore. Given that skills carrying executable scripts are more than twice as likely to be vulnerable, the scanner’s most critical function may be forcing a second look at the moment when a user is about to grant an unknown markdown file the power to run Python on their machine. Whether that second look becomes a habit will determine if SkillSpector is a pioneer or a warning.