Gen-Verse/ReasonFlux
Open-source LLM post-training suite from Princeton and ByteDance featuring reasoning optimization via reinforcement learning and process reward models.

ReasonFlux is a comprehensive post-training framework for developing advanced LLM reasoning capabilities. It includes ReasonFlux-PRM for trajectory-aware process reward modeling, ReasonFlux-Coder for RL-based code generation with co-evolved unit testers, and ReasonFlux-Zero/F1 for hierarchical chain-of-thought reasoning via thought templates. The suite focuses on data selection, reinforcement learning, and inference scaling to improve long-CoT reasoning performance.