← all repositories

jianchang512/stt

An offline local tool that transcribes audio and video files to text using the fast-whisper open-source model.

4.6k stars Python Image · Video · Audio
stt
Velocity · 7d
+5.1
★ / day
Trend
steady
star history

This tool leverages the Whisper model family (tiny to large-v3) to perform offline speech recognition on audio and video files. Users upload files through a web interface, select the language and desired output format, and receive transcribed text as JSON, SRT subtitles with timestamps, or plain text. It automatically uses CUDA acceleration when an NVIDIA GPU is available. The tool serves as a local alternative to cloud APIs like OpenAI’s Whisper API or Baidu Speech Recognition.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.