← all repositories

meizhong986/WhisperJAV

Speech-to-text pipeline combining Whisper, Qwen3-ASR, TEN-VAD, and speech enhancement models to generate subtitles on noisy Japanese audio.

WhisperJAV
Velocity · 7d
+1.7
★ / day
Trend
steady
star history

WhisperJAV is an automated subtitle generator targeting Japanese Adult Video content, which presents acoustic challenges like non-verbal vocalizations, low signal-to-noise ratio, and extreme audio dynamics that degrade standard ASR performance. The pipeline chains TEN-VAD for voice activity detection, Zipformer for speech enhancement, Whisper for initial transcription, and Qwen3-ASR as a local LLM for hallucination correction. It runs on Google Colab, Kaggle, or locally with GGUF/MLX quantization support.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.