zai-org/GLM-V
GLM-V is an open-source series of vision-language models (GLM-4.6V, GLM-4.5V, GLM-4.1V) for multimodal reasoning tasks including image understanding, video comprehension, and complex problem solving.

The repository provides pre-trained vision-language models that process and reason over images and video alongside text. It includes model weights, training code, and inference scripts for the GLM-4.6V, GLM-4.5V, and GLM-4.1V model families. The models are trained using scalable reinforcement learning to enhance complex reasoning and multimodal understanding capabilities.