lxe/llavavision
A web accessibility tool that uses a local LLaVA vision model to narrate visual content for users.

Velocity · 7d
+0.5
★ / day
Trend
→steady
star history
LLaVaVision is a browser-based application inspired by Be My Eyes that captures video frames, sends them to a locally-running multimodal LLaVA model via llama.cpp, and uses the Web Speech API to narrate the model’s descriptions of what it sees. The app uses the BakLLaVA-1 model quantized to q4_k for reasonable performance on consumer hardware with roughly 5GB RAM.