← all repositories
fikrikarim/parlor

Your laptop is now a voice AI that actually sees you

A weekend project proves you don't need OpenAI's servers—or an RTX 5090—to run real-time multimodal voice conversations locally.

parlor
Velocity · 7d
+29
★ / day
Trend
steady
star history

What it does

Parlor is a browser-based voice assistant that runs entirely on your machine. You talk, point your camera at things, and it talks back. The heavy lifting happens locally via a FastAPI server: Google’s Gemma 4 E2B model handles speech and vision understanding through LiteRT-LM, while Kokoro generates text-to-speech responses. A simple WebSocket pipes audio and JPEG frames from your browser to the server and streams synthesized speech back.

The interesting bit

The author built this to solve a real sustainability problem—he was self-hosting a free English-learning voice AI for hundreds of users and needed to kill the server bill. Six months ago that required an RTX 5090. Now it runs on an M3 Pro laptop with ~3 GB RAM. The “barge-in” feature is a nice touch: you can interrupt the AI mid-sentence, which is harder to get right than it sounds when everything is streaming in real time.

Key highlights

  • End-to-end latency of ~2.5–3.0 seconds on Apple M3 Pro (1.8–2.2s for speech/vision understanding, 0.3s for ~25 tokens, 0.3–0.7s for TTS)
  • Decode speed: ~83 tokens/sec on GPU via LiteRT-LM
  • Sentence-level TTS streaming means audio starts before the full response is finished
  • Browser-based VAD (Silero) for hands-free operation, no push-to-talk button
  • Platform-aware TTS: MLX on Mac, ONNX on Linux
  • ~2.6 GB model download on first run, auto-fetched from HuggingFace

Caveats

  • Explicitly marked “research preview” with expected rough edges and bugs
  • macOS requires Apple Silicon; Linux needs a supported GPU
  • Python 3.12+ only, and the frontend is a single index.html—don’t expect a polished UI
  • The author notes you “can’t do agentic coding with this”; it’s narrowly scoped to conversation

Verdict

Worth a spin if you’re building local AI assistants, teaching language learners, or just want to see how far small models have come. Skip it if you need reliability, broad hardware support, or anything beyond a conversational demo—the author is upfront that this is an early experiment, not a product.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.