QwenLM/ParScale
ParScale is a novel LLM scaling paradigm that applies P parallel learnable transformations to inputs, executes model forward passes in parallel, and aggregates outputs to achieve logarithmic scaling comparable to O(log P) parameter growth.

This repository presents a theoretical and empirical framework for scaling language models beyond traditional parameter and inference-time scaling approaches. The method applies P diverse and learnable transformations to the input, executes forward passes in parallel, and dynamically aggregates the P outputs during both training and inference. The work establishes a logarithmic scaling law demonstrating that parallel computation can serve as an efficient substitute for parameter growth in larger models. Pre-trained models and code are provided via Hugging Face.