Is index-tts-vllm open source?

Yes — Ksuriuri/index-tts-vllm is open source, released under the Apache-2.0 license.

What language is index-tts-vllm written in?

Ksuriuri/index-tts-vllm is primarily written in Python.

How popular is index-tts-vllm?

Ksuriuri/index-tts-vllm has 1.2k stars on GitHub.

Where can I find index-tts-vllm?

Ksuriuri/index-tts-vllm is on GitHub at https://github.com/Ksuriuri/index-tts-vllm.

← all repositories

Ksuriuri/index-tts-vllm

TTS inference at 280 tok/s by treating speech like an LLM

Replaces IndexTTS’s native GPT inference with vLLM to cut latency and boost concurrency on an existing speech model.

★1.2k stars Python Inference · Serving Image · Video · Audio

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does IndexTTS-vLLM is an inference wrapper around the open-source IndexTTS family (v1, v1.5, and v2) that reimplements the GPT component using the vLLM serving engine. On a single RTX 4090, the project claims to drop the real-time factor from roughly 0.3 to 0.1 and push GPT decode speed from about 90 tokens per second to around 280 tokens per second for v1/v1.5. It also adds FastAPI and OpenAI-compatible endpoints, plus a WebUI, turning the research model into a local service.

The interesting bit The clever part is recognizing that a TTS model’s internal GPT is, structurally, just another autoregressive transformer that vLLM can already optimize. By handing that workload to vLLM, the project inherits its scheduler and memory manager, squeezing ~16 concurrent streams into ~5 GB of VRAM, while the rest of the pipeline—like the s2mel module—remains serial and unaccelerated.

Key highlights

Roughly 3× GPT decode speedup for IndexTTS v1/v1.5 (≈90 → ≈280 tok/s) and RTF improvement from ≈0.3 to ≈0.1 on an RTX 4090.
Supports ~16 concurrent requests with only ~5 GB of VRAM (gpu_memory_utilization = 0.25).
WER on seed-test nearly matches the original model (1.12 zh / 1.987 en vs. 1.107/2.032 for beam=1).
Ships with WebUI, FastAPI server, and an OpenAI-compatible /audio/speech endpoint.
v1/v1.5 support multi-role audio mixing by blending multiple reference audio clips.

Caveats

The v1/v1.5 API and OpenAI-compatible interfaces still have unresolved bugs, according to the changelog.
For IndexTTS v2, only the GPT2 inference is parallelized; the s2mel stage runs serially and requires 25 DiT iterations, which the README identifies as the current concurrency bottleneck.
s2mel acceleration is explicitly listed as future work.

Verdict Worth a look if you are running IndexTTS in production or locally and need lower latency and more concurrency without retraining. Skip it if you need a fully optimized end-to-end TTS pipeline today, since the mel-spectrogram synthesis remains a serial bottleneck.

Frequently asked

What is Ksuriuri/index-tts-vllm?: Replaces IndexTTS’s native GPT inference with vLLM to cut latency and boost concurrency on an existing speech model.
Is index-tts-vllm open source?: Yes — Ksuriuri/index-tts-vllm is open source, released under the Apache-2.0 license.
What language is index-tts-vllm written in?: Ksuriuri/index-tts-vllm is primarily written in Python.
How popular is index-tts-vllm?: Ksuriuri/index-tts-vllm has 1.2k stars on GitHub.
Where can I find index-tts-vllm?: Ksuriuri/index-tts-vllm is on GitHub at https://github.com/Ksuriuri/index-tts-vllm.