← all repositories
Picovoice/rhino

Speech-to-Intent without the cloud bill

Rhino turns voice commands directly into structured intent and slots, running locally on everything from Cortex-M microcontrollers to your browser.

rhino
Velocity · 7d
+0.3
★ / day
Trend
steady
star history

What it does

Rhino listens to spoken commands and skips the middleman: no raw transcription, no cloud round-trip. It maps utterances directly to structured intents and named slots within a developer-defined “context.” Say “small double-shot espresso” and it emits orderBeverage with size: small, numberOfShots: 2. The heavy lifting happens on-device via deep neural networks trained on real-world noise.

The interesting bit

The constraint is the feature. Rhino only understands what you pre-define in a context—essentially a YAML-ish grammar of expressions and slot types. This trades open-ended chit-chat for reliability and a tiny footprint. Picovoice claims it outperforms cloud alternatives on accuracy within narrow domains, though the benchmark chart in the repo lacks hard numbers; you’ll need to follow the link to their separate benchmark repo for specifics.

Key highlights

  • Runs on ARM Cortex-M, STM32, Arduino, Raspberry Pi, mobile, desktop, and major browsers
  • Supports nine languages including English, Mandarin, Japanese, and Korean
  • Custom contexts trained through Picovoice Console; requires an access key to run
  • SDKs for Python, .NET, Java, Flutter, React Native, Node.js, Web (vanilla + React), Android, iOS, and C
  • Self-service model: you define expressions and slot types, Rhino handles the inference

Caveats

  • Requires a Picovoice access key even for demos; the console dependency means you’re somewhat locked into their ecosystem
  • Not a general speech-to-text engine—if your use case needs open-domain recognition, this is the wrong tool
  • The README’s benchmark chart is eye candy without numbers; actual metrics live in a separate repo

Verdict

Worth a look if you’re building voice control into hardware where latency, privacy, or connectivity matter more than conversational flexibility. Skip it if you need free-form dictation or want to avoid vendor-specific training pipelines.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.