← all repositories
chubbyguan/chubbyskills

The Chinese Content Pipeline Your Agent Can Actually Read

A set of ingest skills that turn Chinese social feeds and podcasts into structured Markdown, indexes them in a vault, and exposes the archive to Claude Code or Codex through an MCP server.

chubbyskills
Collecting fresh signals — velocity needs a few days of history.
collecting data…
star history

What it does

Chubby Skills is a content ingestion pipeline built for Chinese platforms. It grabs posts, videos, and articles from Bilibili, Douyin, Xiaohongshu, WeChat, X, and podcasts, converts them into Markdown with schema v1 frontmatter, and drops them into an Obsidian-compatible vault. A companion MCP server lets Claude Code, Codex, and other agents search, read, and reason over the accumulated notes.

The interesting bit

The project treats platform fragility as a first-class concern. It uses subtitle-first transcription to avoid GPU-heavy audio extraction, defines explicit fallback chains for every platform when cookies expire or links rot, and ships a full QA layer—smoke tests, golden outputs, and schema validators—to keep the scrapers honest.

Key highlights

  • Covers Chinese platforms often ignored by English-centric tools: Bilibili, Douyin, Xiaohongshu, WeChat, and local podcast apps.
  • Subtitle-first transcription falls back to local audio extraction only when necessary, keeping the default install lightweight.
  • Tiered dependency model: light mode handles text and images; heavy mode adds ffmpeg, yt-dlp, and faster-whisper for video/audio.
  • Built-in vault curation auto-archives processed notes and generates knowledge cards with source attribution.
  • MCP server exposes search_vault, semantic_search_vault, and read_kb_note to Claude Code, Codex, OpenClaw, and Hermes.

Caveats

  • Video and podcast transcription pulls in heavy dependencies like torch and ffmpeg; the light install skips them entirely.
  • Live platform tests are opt-in because scraping is brittle: cookies expire, regions get blocked, and links rot quickly.
  • Semantic search defaults to a zero-dependency semantic-lite mode; real vector search requires wiring up OpenAI or sentence-transformers.

Verdict

Worth a look if you curate Chinese-language content and want your AI agent to actually reference it. Skip it if you only consume English feeds—there are simpler tools for that.

Frequently asked

What is chubbyguan/chubbyskills?
A set of ingest skills that turn Chinese social feeds and podcasts into structured Markdown, indexes them in a vault, and exposes the archive to Claude Code or Codex through an MCP server.
Is chubbyskills open source?
Yes — chubbyguan/chubbyskills is open source, released under the MIT license.
What language is chubbyskills written in?
chubbyguan/chubbyskills is primarily written in Python.
How popular is chubbyskills?
chubbyguan/chubbyskills has 501 stars on GitHub.
Where can I find chubbyskills?
chubbyguan/chubbyskills is on GitHub at https://github.com/chubbyguan/chubbyskills.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.