Your Bilibili 'watch later' graveyard, now searchable
Turns hoarded Chinese video bookmarks into a queryable knowledge base via speech-to-text and RAG.

What it does Log in with a Bilibili QR code, pick a favorites folder, and the pipeline downloads videos, transcribes audio to text via DashScope’s ASR, embeds chunks into ChromaDB, and serves a chat interface where you ask questions against your own collection. Think of it as a personal podcast/lecture archive with Ctrl+F superpowers.
The interesting bit
The ASR fallback is where the work lives: when Bilibili audio URLs 403, it downloads locally through cookies, ffmpeg-transcodes to 16kHz mono, and re-uploads to DashScope. The README is unusually frank about cost (test with 10-minute clips first) and configuration footguns like mixing up OPENAI_BASE_URL with DASHSCOPE_BASE_URL.
Key highlights
- FastAPI backend + Next.js/Tailwind frontend, Docker Compose for one-command local deploy
- SQLite for metadata, ChromaDB for vectors, LangChain wiring to DashScope/Qwen
- OpenClaw Skill included for external automation (cron summaries, status reports)
- Retrieval tuned via MMR with exposed
RETRIEVAL_CANDIDATE_K/TOP_K/LAMBDAknobs - Test scripts for diagnosing ASR failures and vector recall quality
Caveats
- Requires DashScope API key; free tier exists but costs scale with audio duration + tokens
- Multi-part (分P) videos not yet supported
- Test scripts must be moved to project root to run due to relative path assumptions
Verdict Worth a spin if your Bilibili favorites are a black hole of “I’ll watch this later.” Skip it if you don’t read Chinese or your video diet is elsewhere — the Bilibili-specific auth and ASR plumbing won’t travel well.