antirez/voxtral.c

Pure C implementation of Mistral Voxtral Realtime 4B speech-to-text model inference with streaming support.

★1.7k stars C Inference · Serving Image · Video · Audio

View on GitHub ↗

Velocity · 7d

+14

★ / day

Trend

→steady

star history

This project provides a standalone C inference engine for Mistral AI’s Voxtral Realtime 4B speech-to-text model, requiring no external dependencies beyond the C standard library. It supports multiple backends (MPS for Apple Silicon, BLAS) and features a chunked audio encoder with overlapping windows that bounds memory usage regardless of input length. A streaming C API allows incremental audio feeding with token output as it becomes available. The project also includes a self-contained Python reference implementation for understanding the model inference pipeline.