FreedomIntelligence/HuatuoGPT-o1
Medical LLM for advanced medical reasoning built on LLaMA-3.1 and Qwen2.5, trained with verifier-guided search and PPO reinforcement learning.

Velocity · 7d
+2.5
★ / day
Trend
→steady
star history
HuatuoGPT-o1 is a specialized medical language model designed for advanced medical reasoning tasks. It uses a verifier-based approach to guide the search for complex reasoning trajectories, and applies reinforcement learning with PPO using verifier-based rewards to enhance reasoning capabilities. The repository provides 7B, 8B, and 70B model variants along with training data and code.