← all repositories

jingyaogong/minimind-v

A project for training a 65M-parameter vision-language model from scratch in approximately 2 hours.

minimind-v
Velocity · 7d
+13
★ / day
Trend
steady
star history

MiniMind-V is an open-source implementation of a small vision-language model (VLM) designed to be trained from scratch with minimal resources. The project provides minimal, educational code covering VLM architecture, dataset cleaning, pretraining, and supervised fine-tuning stages. It aims to serve as both a functional open VLM model and a practical tutorial for understanding vision-language modeling.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.