TheStageAI/TheWhisper
Optimized Whisper speech-to-text models with high-performance inference engines for NVIDIA GPUs, Apple Silicon, and CoreML.

TheWhisper provides open-source optimized Whisper transcription models with streaming inference support across multiple platforms. It includes Hugging Face-hosted model weights with flexible chunk sizing, high-performance inference engines achieving 220 tok/s on NVIDIA L40s GPUs, and CoreML-powered engines for Apple Silicon with minimal power consumption. The repository offers a local REST API, Electron demo app, and is designed for low-latency, scalable real-time transcription suitable for captioning, meetings, and voice interfaces.