The Search Engine That Treats Reddit Upvotes as PageRank

Contributing Editor

An AI agent skill that scrapes a dozen social platforms in parallel, scores results by engagement rather than SEO, and synthesizes what people actually care about right now.

mvanhorn/last30days-skill

★53.8k stars Velocity · 7d +143 ★/day ↘cooling

star history

View on GitHub ↗

The Attention Economy, Weaponized for Research

The repository mvanhorn/last30days-skill has been climbing GitHub’s trending lists with the kind of velocity that usually signals either a genuine utility or exceptionally good marketing. In this case, the marketing is essentially the README itself—a sprawling, confident document that reads like a product demo crossed with a manifesto. The project is a “skill” (in the emerging agent-skills ecosystem) that turns an AI coding assistant into a research agent capable of querying Reddit, X, YouTube, TikTok, Hacker News, Polymarket, GitHub, and a growing list of other platforms simultaneously, then scoring and synthesizing the results by engagement metrics rather than traditional search relevance.

The core premise is disarmingly simple: Google ranks by editorial authority and SEO optimization. Social platforms rank by what people actually click, upvote, like, watch, and bet money on. /last30days treats the latter as the signal worth listening to. The result is a research brief that reads less like a bibliography and more like a weighted consensus of what the internet’s various factions are actually saying right now.

What “Agent Skills” Actually Means Here

The project exists within a nascent but rapidly expanding ecosystem of “agent skills”—modular capabilities that plug into AI coding assistants like Claude Code, Cursor, GitHub Copilot, and dozens of others via the Agent Skills framework. The skill itself is essentially a Python engine wrapped in a specification that tells the host agent how to invoke it, what parameters it accepts, and how to interpret its output. The README notes 5,200 total installs via Claude Code marketplaces alone, suggesting this is one of the more adopted skills in a still-small field.

The architecture matters because it explains why this isn’t just another Python scraper. The skill leverages the host agent’s reasoning capabilities for pre-research entity resolution—figuring out that “OpenClaw” maps to Peter Steinberger’s GitHub handle steipete, or that “Paperclip” resolves to a specific founder’s X account—before firing a single API call. This two-layer design, with a reasoning model handling disambiguation and a Python engine handling parallel execution, is what lets the system claim it “understands your topic first, then searches the right people and communities.”

The v3 Engine: Parallel Search with a Pre-Research Brain

The current version, v3, introduces what the maintainers call “intelligent search”—a Python-based pre-research module built by contributor @j-sperling that resolves entities before searching. The old engine searched keywords across all sources. The new engine attempts to map your query to specific handles, subreddits, YouTube channels, and TikTok hashtags, then fans out targeted queries in parallel. The claim is that this lets v3 find content v2 “never could,” though the README offers no quantitative validation of this beyond illustrative examples.

The parallel execution model is where the technical design gets interesting. Each source runs its own query pipeline: Reddit via public JSON endpoints (free, no API key), X via a vendored Node.js “Bird client” that uses your browser session, YouTube via yt-dlp for transcript extraction, TikTok/Instagram/Threads via the ScrapeCreators API, Polymarket via its public odds API, GitHub via the standard API for repository and user data. The engine then merges cross-source clusters—detecting when the same story appears on Reddit, X, and YouTube—and scores everything by engagement metrics normalized per-platform.

The synthesis layer is where the host AI agent re-enters. The engine returns structured data with engagement scores; the agent writes the narrative brief, grounded in specific citations with inline attribution. The README emphasizes that this isn’t “here’s what I found” but “here’s what matters”—a distinction that sounds like marketing but actually reflects a real design choice to have the agent rank by perceived significance rather than chronological or keyword relevance.

The Platform Moat That Isn’t

One of the more candid passages in the README acknowledges why this tool exists as a third-party hack rather than a native feature of any major AI assistant: “You can’t get this search anywhere else because no single AI has access to all of it.” ChatGPT has a Reddit deal but no X access. Gemini has YouTube but not Reddit. Claude has none natively. Each platform is a “walled garden with its own API, its own tokens, its own auth.” The skill’s value proposition is essentially arbitrage across these fragmented access patterns—bring your own keys and browser sessions, and suddenly you can query across boundaries that the platform owners have no incentive to bridge.

This is also where the project’s limitations become visible. The free tier covers Reddit, Hacker News, Polymarket, and GitHub. Everything else requires configuration: X needs a logged-in browser session, YouTube needs yt-dlp installed, TikTok/Instagram/Threads/Pinterest need a ScrapeCreators API key, Perplexity needs an OpenRouter key. The “zero config” claim is technically true for a subset of sources, but the full experience requires a growing collection of third-party credentials. The macOS Keychain integration is a nice touch for credential management, but the fundamental reality is that this is a bring-your-own-access model that shifts the API cost and rate-limit complexity to the user.

What People Actually Use It For

The README is unusually specific about use cases, which is refreshing. Pre-meeting intelligence on a person: their recent tweets, podcast transcripts, GitHub activity, Reddit mentions. Competitive tool comparisons with live star counts from GitHub rather than stale blog posts. Trip planning with real-time community sentiment about ride closures and wait times. Prompt engineering research by scraping what the community has “already figured out” that training data hasn’t caught up to.

The Peter Steinberger example is worth examining because it demonstrates both the promise and the potential noise. The brief claims he “joined OpenAI to work on Codex, fighting Anthropic’s ban on third-party agents, shipping 23 PRs at 85% merge rate, building ‘LobsterOS’ for cross-device agent control.” This is specific enough to be useful if accurate, but the sourcing is a mix of X posts, Reddit threads, and GitHub commits—none of which have the verification standards of traditional journalism. The Reddit quote about OpenClaw users getting “banned eventually” carries 227 upvotes, which the engine treats as a credibility signal. Whether upvotes correlate with accuracy is, of course, an open question.

The Polymarket integration is the most distinctive source. Prediction market odds backed by real money are treated as harder to argue with than “a pundit’s guess.” The README notes that v3 specifically displays percentage odds rather than dollar volumes, focusing on the implied probability rather than the betting market’s liquidity. This is a defensible choice—odds are more interpretable for most users—but it also obscures how thin or thick the market is for any given question.

The Hype Cycle and the “Agent” Label Problem

The broader context for this project’s attention is the current explosion of tools calling themselves “AI agents.” As one industry analysis notes, “Every social media tool released in 2026 calls itself an ‘AI agent,’ whether it is a scheduling app, a chatbot builder, a caption generator, or an analytics dashboard.” The term has become so diluted that it risks becoming meaningless—a marketing label applied across capability levels that range from simple automation to genuine autonomous behavior.

/last30days sits somewhere in the middle of this spectrum. It requires human initiation (you type the query), but handles the full execution pipeline autonomously: entity resolution, multi-source querying, engagement scoring, cross-source merging, and narrative synthesis. The human reviews the output but doesn’t direct each step. By the framework in that analysis, this would qualify as “autonomous with guardrails”—level two of three, where the agent drives the workflow and the human approves or acts on the output.

The project’s 1,012 passing tests and MIT license suggest a level of engineering seriousness that distinguishes it from pure marketing plays. The v3 changelog shows genuine iteration: HTML brief export, ELI5 mode, GitHub person-mode for author-scoped queries, competitor auto-discovery, per-author caps to prevent single-voice dominance. These are features that emerge from real usage friction, not speculative capability demos.

Where the Rough Edges Show

The README is remarkably forthcoming about limitations, which is either good documentation or effective trust-building. Data quality warnings (“degraded run,” “thin evidence”) stay in engine stderr logs rather than leaking into shareable artifacts—a design choice that prioritizes presentability over transparency. The “fun judge v2” that scores humor and virality alongside relevance sounds like a gimmick but addresses a real problem: the most engaging social content is often not the most informationally dense, and a research tool that only scores relevance will bury the quotable one-liners that actually drive understanding of community sentiment.

The dependency on ScrapeCreators for TikTok, Instagram, Threads, and Pinterest introduces a single point of failure and cost. The 100 free credits then pay-as-you-go model means heavy users will hit billing thresholds quickly. The Reddit integration’s reliance on public JSON endpoints is free but fragile—Reddit has historically been hostile to unofficial access, and the “resilient Reddit” timeout budgets suggest the maintainers have already encountered reliability issues.

The Bigger Picture: Search Fragmentation as Feature

What /last30days really exploits is the balkanization of online discourse. The major platforms have consolidated user attention but walled off their data from each other and from third-party search. Google’s universal search dream never fully included the social layer; the social platforms have no incentive to make themselves fully searchable by competitors. The result is that the most current, most engaged-with information about many topics lives in a dozen disconnected silos.

The skill doesn’t solve this fragmentation—it routes around it, using your credentials as bridge material. It’s a pragmatic hack that treats platform moats as a user-configurable access problem rather than an architectural impossibility. Whether this model scales as platforms tighten API access and authentication requirements is an open question. The README’s “community contributors keep adding more” list of planned sources (Truth Social, Xiaohongshu) suggests the maintainers are betting on a long tail of platform integrations rather than a stable core.

For now, the project’s traction appears real—trending on GitHub, thousands of installs, a growing ecosystem of agent-skills hosts. Whether it becomes a durable tool or a snapshot of a particular moment in AI assistant capabilities depends on whether the platform access it relies on remains available, and whether the engagement-scoring model produces genuinely better research or just differently biased research. The bet is that upvotes, likes, and prediction market odds capture something that PageRank misses. The risk is that they capture something noisier, not truer.