jishengpeng/WavTokenizer
A discrete audio codec that compresses speech, music, and audio into 40-75 tokens per second for use in language model pipelines.

Velocity · 7d
+2.0
★ / day
Trend
→steady
star history
WavTokenizer is a state-of-the-art discrete acoustic codec model that tokenizes audio into 40 or 75 tokens per second. It encodes speech, music, and general audio into discrete semantic tokens suitable for audio language modeling tasks similar to GPT-4o. The model supports reconstruction of audio from discrete tokens and is intended as a front-end component for generative audio models and multimodal LLM systems.