JARVIS for your Ray-Bans, minus the billion-dollar R&D budget
Turns Meta's smart glasses into a real-time AI assistant that sees, hears, and actually does things through Gemini Live.

What it does VisionClaw is an iOS/Android app that hijacks Meta Ray-Ban smart glasses to pipe camera frames and audio into Google’s Gemini Live API. You tap a button, talk, and Gemini sees what you see while routing tasks—messages, web search, smart home control—through a local OpenClaw gateway. No Meta AI subscription required, just your own API key and some elbow grease.
The interesting bit
The project treats the glasses as dumb AV peripherals and does the actual intelligence on your phone. Video gets throttled to ~1fps JPEGs (the README is explicit: 50% quality, deliberately low-bandwidth), while audio stays bidirectional and native—no STT middleman. The OpenClaw integration is the secret sauce: Gemini gets one generic execute tool that fans out to 56+ skills, so the model doesn’t need to know about WhatsApp, Telegram, or your light switches.
Key highlights
- Works without glasses: “Phone mode” uses your back camera for testing the full pipeline
- WebRTC live streaming: share your POV to a browser with a 6-character room code (separate from AI mode—audio device conflict)
- Real audio engineering: different iOS audio sessions for glasses vs. phone mode, echo cancellation handled explicitly
- Android setup requires GitHub Packages auth even for public repos—a friction point the README documents rather than hides
- In-app settings override hardcoded secrets, so you don’t recompile to change your OpenClaw host
Caveats
- Meta Developer Mode requires tapping the app version 5 times, which feels like a cheat code from 2007
- OpenClaw gateway must run on your local network; the example config hardcodes
bind: "lan"and a Mac Bonjour hostname - WebRTC streaming and Gemini Live cannot run simultaneously due to audio device contention
Verdict Grab this if you own Ray-Bans, distrust cloud-only AI stacks, and enjoy soldering software together. Skip it if you want a polished consumer product—this is a reference implementation with sharp edges and a working demo.