NVlabs/Long-RL
A full-stack framework for scaling reinforcement learning training of vision-language models to long video reasoning.

Long-RL addresses the challenge of applying reinforcement learning to long video reasoning in vision-language models. It provides a 104K-sample dataset called LongVideo-Reason with high-quality reasoning annotations across diverse domains, combined with a two-stage training pipeline that handles the computational challenges of long-sequence sequence parallelism. The work produces the LongVILA-R1-7B model, demonstrating effective RL scaling to extended multi-modal contexts.