← all repositories

PKU-YuanGroup/LLaVA-CoT

LLaVA-CoT is an open-source visual language model designed for systematic chain-of-thought reasoning across multimodal inputs.

LLaVA-CoT
Velocity · 7d
+3.8
★ / day
Trend
steady
star history

This project develops a multimodal large language model that performs step-by-step reasoning on visual inputs. It combines vision encoding with language model reasoning to enable structured, interpretable answers on visual understanding tasks. The repository includes both training code and inference scripts, and the model was trained on a dataset of 100k reasoning examples.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.