jianchang512/stt
An offline local tool that transcribes audio and video files to text using the fast-whisper open-source model.

This tool leverages the Whisper model family (tiny to large-v3) to perform offline speech recognition on audio and video files. Users upload files through a web interface, select the language and desired output format, and receive transcribed text as JSON, SRT subtitles with timestamps, or plain text. It automatically uses CUDA acceleration when an NVIDIA GPU is available. The tool serves as a local alternative to cloud APIs like OpenAI’s Whisper API or Baidu Speech Recognition.