antirez/voxtral.c
Pure C implementation of Mistral Voxtral Realtime 4B speech-to-text model inference with streaming support.

This project provides a standalone C inference engine for Mistral AI’s Voxtral Realtime 4B speech-to-text model, requiring no external dependencies beyond the C standard library. It supports multiple backends (MPS for Apple Silicon, BLAS) and features a chunked audio encoder with overlapping windows that bounds memory usage regardless of input length. A streaming C API allows incremental audio feeding with token output as it becomes available. The project also includes a self-contained Python reference implementation for understanding the model inference pipeline.