shashikg/WhisperS2T
Optimizes the Whisper speech-to-text model with multiple inference backends including TensorRT for faster transcription.

Velocity · 7d
+0.6
★ / day
Trend
→steady
star history
WhisperS2T is an optimized speech-to-text pipeline built around OpenAI’s Whisper model. It supports multiple inference engines such as TensorRT and TensorRT-LLM, and includes voice activity detection to improve transcription speed and accuracy. The project claims 2.3X speed improvement over WhisperX and 3X over HuggingFace’s implementation with FlashAttention 2.