PKU-YuanGroup/LLaVA-CoT
LLaVA-CoT is an open-source visual language model designed for systematic chain-of-thought reasoning across multimodal inputs.

Velocity · 7d
+3.8
★ / day
Trend
→steady
star history
This project develops a multimodal large language model that performs step-by-step reasoning on visual inputs. It combines vision encoding with language model reasoning to enable structured, interpretable answers on visual understanding tasks. The repository includes both training code and inference scripts, and the model was trained on a dataset of 100k reasoning examples.