Is WhisperJAV open source?

Yes — meizhong986/WhisperJAV is open source, released under the MIT license.

What language is WhisperJAV written in?

meizhong986/WhisperJAV is primarily written in Python.

How popular is WhisperJAV?

meizhong986/WhisperJAV has 2k stars on GitHub and is currently cooling off.

Where can I find WhisperJAV?

meizhong986/WhisperJAV is on GitHub at https://github.com/meizhong986/WhisperJAV.

← all repositories

meizhong986/WhisperJAV

Subtitle generator for a domain where Whisper hallucinates

A specialized ASR pipeline that treats JAV audio as an adversarial attack on speech recognition and fights back with scene segmentation, defensive decoding, and surgical audio processing.

★2k stars Python Inference · Serving Language Models Data Tooling

View on GitHub ↗ Homepage ↗

Velocity · 7d

+6.1

★ / day

Trend

↘cooling

star history

What it does

WhisperJAV generates subtitles for Japanese Adult Videos, a niche where standard ASR models collapse. The README opens with a frank diagnosis: “acoustic hell” — heavy breathing, gasps, and spectral mimicry that trick Whisper into recognizing phantom syllables, plus 120-minute runtime drift that triggers hallucination loops. The tool wraps multiple backends (Faster-Whisper, Qwen3-ASR, anime-whisper, Kotoba) in configurable pipelines with scene-based segmentation, speech enhancement, and log-probability thresholding to discard low-confidence output.

The interesting bit

The project treats audio preprocessing as a paradox: aggressive denoising can strip the very high-frequency transients Whisper needs for consonants, so it opts for “surgical” per-scene VAD clamping instead. The two-pass ensemble mode runs different pipelines and merges their outputs — different models catch different utterances — with strategies like “smart_merge” for overlap detection and “longest” for keeping whichever pass produced more text per segment.

Key highlights

Seven processing pipelines from “faster” (speed) to “fidelity” (accuracy) to “qwen” (Qwen3-ASR with forced alignment)
ChronosJAV subsystem decouples text generation from timestamp alignment, allowing any audio-to-text model to plug in via YAML config
Speech enhancement backends include ClearVoice, BS-RoFormer vocal isolation, ZipEnhancer, and lightweight FFmpeg DSP filters
Built-in AI translation via Ollama (auto-detects GPU, picks model by VRAM), DeepSeek, Gemini, Claude, GPT-4, OpenRouter
GUI with persistent settings, ensemble presets, and four-tab layout; also runs headless

Caveats

README warns that fine-tuned models risk overfitting due to “scarcity of high-quality, ethically sourced JAV datasets”
Speech enhancement is described as risky: “audio processing that alters the mel-spectrogram can introduce artefacts”
Translation section cuts off mid-sentence in the provided source (“Local LLM Translation (Legacy)” is truncated)

Verdict

Worth studying if you work on domain-specific ASR, long-form audio hallucination, or media pipeline engineering. The techniques — scene segmentation, defensive decoding, ensemble merging — transfer to other noisy spontaneous speech domains. For general subtitle generation, standard Whisper or faster-whisper is probably sufficient and less complex.

Frequently asked

What is meizhong986/WhisperJAV?: A specialized ASR pipeline that treats JAV audio as an adversarial attack on speech recognition and fights back with scene segmentation, defensive decoding, and surgical audio processing.
Is WhisperJAV open source?: Yes — meizhong986/WhisperJAV is open source, released under the MIT license.
What language is WhisperJAV written in?: meizhong986/WhisperJAV is primarily written in Python.
How popular is WhisperJAV?: meizhong986/WhisperJAV has 2k stars on GitHub and is currently cooling off.
Where can I find WhisperJAV?: meizhong986/WhisperJAV is on GitHub at https://github.com/meizhong986/WhisperJAV.