← all repositories

lxe/llavavision

A web accessibility tool that uses a local LLaVA vision model to narrate visual content for users.

llavavision
Velocity · 7d
+0.5
★ / day
Trend
steady
star history

LLaVaVision is a browser-based application inspired by Be My Eyes that captures video frames, sends them to a locally-running multimodal LLaVA model via llama.cpp, and uses the Web Speech API to narrate the model’s descriptions of what it sees. The app uses the BakLLaVA-1 model quantized to q4_k for reasonable performance on consumer hardware with roughly 5GB RAM.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.