LLaVA-VL/LLaVA-Plus-Codebase
A multimodal LLM that learns to plug in and use various tools/skills for general vision tasks.

Velocity · 7d
+0.8
★ / day
Trend
→steady
star history
LLaVA-Plus extends large multimodal models with the ability to use tools across different modalities. It enables vision-language assistants to plug into external models, APIs, and skills to complete complex tasks. The codebase supports training, evaluation, and deployment of these tool-augmented multimodal agents, with support for various vision tasks via integrated external tools.