TinyLLaVA/TinyLLaVA_Factory
A framework for training and building small-scale large multimodal models combining vision and language capabilities.

Velocity · 7d
+1.2
★ / day
Trend
→steady
star history
TinyLLaVA Factory provides a modularized codebase for developing small-scale large multimodal models (LMMs). It supports training and experimenting with vision-language models built on architectures like LLaVA and TinyLLaMA with transformer backends. The project includes model implementations, training pipelines, and evaluation tools for multimodal AI research.