Is FluidAudio open source?

Yes — FluidInference/FluidAudio is open source, released under the Apache-2.0 license.

What language is FluidAudio written in?

FluidInference/FluidAudio is primarily written in Swift.

How popular is FluidAudio?

FluidInference/FluidAudio has 2.5k stars on GitHub and is currently cooling off.

Where can I find FluidAudio?

FluidInference/FluidAudio is on GitHub at https://github.com/FluidInference/FluidAudio.

← all repositories

FluidInference/FluidAudio

The speech SDK that treats your GPU as optional

FluidAudio ports open-source speech, diarization, and synthesis models to CoreML so iOS and macOS apps can transcribe, clone voices, and separate speakers without leaving the device.

★2.5k stars Swift Inference · Serving Image · Video · Audio

View on GitHub ↗ Homepage ↗

Velocity · 7d

+5.6

★ / day

Trend

↘cooling

star history

What it does

FluidAudio is a Swift SDK that bundles open-source audio models—ASR, text-to-speech, voice activity detection, and speaker diarization—into CoreML format for macOS and iOS. It runs inference exclusively on the Apple Neural Engine, which the authors claim reduces memory use and CPU load compared to GPU-based paths. The models are permissively licensed and hosted on HuggingFace, so you’re not locked into a cloud API or proprietary weights.

The interesting bit

Most on-device ML frameworks treat the GPU as the default accelerator; FluidAudio deliberately avoids Metal Performance Shaders entirely and optimizes for the ANE, targeting background and always-on workloads where battery and thermals matter. It also ships an experimental autoregressive TTS model, Magpie, that the authors openly admit is currently about 25× slower than real-time—an unusual level of candor in a space that usually buries such details.

Key highlights

Parakeet ASR models for batch transcription across 25 European languages plus Japanese and Chinese, and a streaming English-only model with end-of-utterance detection
Kokoro and PocketTTS for parallel or streaming synthesis, plus voice-cloning support in PocketTTS
Online and offline speaker diarization with streaming and batch clustering pipelines
Inverse text normalization to convert spoken forms like “two hundred” into “200”
All models run on the ANE, not the GPU, pitched for ambient and background processing

Caveats

Magpie TTS is explicitly marked experimental and runs at roughly 0.04 RTFx on Apple Silicon, making it impractical for production use until performance work lands
The README is heavy on app showcase and light on API surface details, so integration complexity is unclear
Streaming ASR with end-of-utterance detection is English-only

Verdict

Worth a look if you’re building macOS or iOS apps that need private, offline speech processing and want to squeeze work onto the Neural Engine. Skip it if you need cross-platform Android or Windows support, or if you were hoping to run everything on the GPU.

Frequently asked

What is FluidInference/FluidAudio?: FluidAudio ports open-source speech, diarization, and synthesis models to CoreML so iOS and macOS apps can transcribe, clone voices, and separate speakers without leaving the device.
Is FluidAudio open source?: Yes — FluidInference/FluidAudio is open source, released under the Apache-2.0 license.
What language is FluidAudio written in?: FluidInference/FluidAudio is primarily written in Swift.
How popular is FluidAudio?: FluidInference/FluidAudio has 2.5k stars on GitHub and is currently cooling off.
Where can I find FluidAudio?: FluidInference/FluidAudio is on GitHub at https://github.com/FluidInference/FluidAudio.