RLHF-V/RLAIF-V
Open-source framework for aligning multimodal large language models using AI feedback, achieving GPT-4V-level trustworthiness.

Velocity · 7d
+0.6
★ / day
Trend
→steady
star history
RLAIF-V introduces a novel paradigm for training and aligning multimodal large language models using open-source AI feedback. The project provides a full pipeline including high-quality feedback data, online feedback learning algorithms, and pre-trained model weights (7B and 12B variants). The resulting models and training data are used by projects like MiniCPM-Llora3-V 2.5 for building competitive vision-language models.