jingyaogong/minimind-v
A project for training a 65M-parameter vision-language model from scratch in approximately 2 hours.

Velocity · 7d
+13
★ / day
Trend
→steady
star history
MiniMind-V is an open-source implementation of a small vision-language model (VLM) designed to be trained from scratch with minimal resources. The project provides minimal, educational code covering VLM architecture, dataset cleaning, pretraining, and supervised fine-tuning stages. It aims to serve as both a functional open VLM model and a practical tutorial for understanding vision-language modeling.