Speech recognition that actually feels like SwiftUI
A wrapper around Apple's Speech framework that replaces boilerplate with view modifiers and Combine publishers.

What it does
SwiftSpeech wraps Apple’s Speech framework so you can drop a record button into a SwiftUI view, attach modifiers like .swiftSpeechRecordOnHold() and .onRecognizeLatest(update: $text), and get transcribed speech without touching AVAudioEngine or SFSpeechRecognizer directly. It handles the mic and speech authorization prompts, though you still need to add the Info.plist usage descriptions yourself.
The interesting bit
The architecture separates view components (pure UI, no gesture handling) from functional components (the actual interaction logic). This means you can skin the record button however you want without rewriting the speech pipeline — or swap in a tap-to-toggle instead of hold-to-record by changing one modifier. The whole thing publishes through Combine, so you can chain onStartRecording(sendSessionTo:) into your existing reactive flow.
Key highlights
- Three built-in demos (basic, colors, list) with Xcode preview support; “Preview on Device” required since the simulator lacks mic access
SwiftSpeech.Sessionexposes rawresultPublisherandstringPublisherfor manual subscription if the modifiers aren’t enough- Supports locale configuration, custom vocabulary via
contextualStrings, and on-device recognition options - Modifiers stack without overriding — closest to the functional component executes first
- Also available via CocoaPods, though SPM is recommended
Caveats
- iOS 13+ only; no macOS or watchOS mentioned in the README
- The WeChat voice-message mock and additional examples live in a separate repo, not the main package
- Authorization is “taken care of” but still requires manual
Info.plistentries and an explicit.requestSpeechRecognitionAuthorization()call
Verdict
Worth a look if you’re building a SwiftUI app that needs voice input and you don’t want to drown in SFSpeechRecognizer boilerplate. Skip it if you need cross-platform support or want to deeply customize the audio pipeline — this is a convenience layer, not an engine replacement.