← all repositories
wendy7756/AI-Video-Transcriber

Skip the audio: this transcriber loots YouTube subtitles first

A self-hosted tool that transcribes and summarizes videos by extracting existing subtitles before ever touching Whisper.

2.7k stars Python Data ToolingLanguage Models
AI-Video-Transcriber
Velocity · 7d
+9.6
★ / day
Trend
steady
star history

What it does Drop a YouTube, TikTok, or podcast URL (or a local audio/video file) into a web UI and get back a cleaned-up transcript, optional translation, and an AI summary. The whole thing runs locally as a FastAPI server with a vanilla-JS frontend.

The interesting bit The “subtitle-first architecture” is the quiet win: for platforms like YouTube that already have captions, it grabs those instantly and skips audio download + Whisper entirely. Only when no subtitles exist does it fall back to Faster-Whisper on normalized 16 kHz mono audio. That pipeline choice matters more than the model choice.

Key highlights

  • Supports 30+ platforms via yt-dlp, plus local uploads (.mp3, .mp4, .txt, etc.)
  • Bring-your-own-model: enter any OpenAI-compatible API base URL + key in the UI, click Fetch, and auto-discover available models
  • Conditional translation: auto-detects when summary language ≠ source language and adds a Translation tab
  • Docker Compose or ./install.sh for setup; runs on Python 3.8+
  • Production mode (--prod) keeps SSE connections alive for 30–60+ minute jobs

Caveats

  • Requires FFmpeg and an OpenAI-compatible API key (no local LLM inference out of the box)
  • Default Whisper model is base; larger models get slow and memory-hungry fast
  • README notes HTTP 500 errors are “usually environment configuration issues” — suggests rough edges in error handling

Verdict Good fit if you want a private, self-hosted alternative to cloud transcription services and can tolerate some manual setup. Skip it if you need fully offline LLM inference or enterprise-grade error resilience.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.