Is LightReasoner open source?

Yes — HKUDS/LightReasoner is open source, released under the MIT license.

What language is LightReasoner written in?

HKUDS/LightReasoner is primarily written in Python.

How popular is LightReasoner?

HKUDS/LightReasoner has 602 stars on GitHub.

Where can I find LightReasoner?

HKUDS/LightReasoner is on GitHub at https://github.com/HKUDS/LightReasoner.

← all repositories

HKUDS/LightReasoner

Small models can teach LLMs to reason—using 99% fewer tokens

LightReasoner replaces exhaustive supervised fine-tuning with a small-model teacher that flags only the reasoning steps worth learning, slashing tuned tokens by 99%.

★602 stars Python Language Models LLMOps · Eval

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does

LightReasoner is a post-training framework that uses small language models as diagnostic tools for larger ones. It pairs an “expert” LLM with a weaker “amateur” SLM, measures KL divergence between their outputs to locate high-leverage reasoning steps, and fine-tunes the expert only on those critical moments. The authors report accuracy gains on math benchmarks—such as +28.1% on GSM8K for Qwen2.5-Math-1.5B and +6.0% on MATH for the 7B variant—while claiming a 99% reduction in tuned tokens and 90% less training time versus standard supervised fine-tuning. Pre-built datasets and fine-tuned checkpoints are available on Hugging Face.

The interesting bit

The twist is that the amateur does not learn from the expert; the expert learns from the amateur’s mistakes. By measuring where the small model’s probability distribution diverges most from the large model’s, the system identifies reasoning bottlenecks without ground-truth labels. The README notes there is a “sweet spot” in the capability gap—too weak and the amateur produces incoherent noise, too strong and the contrast disappears.

Key highlights

Reported efficiency: 99% fewer tuned tokens, 90% less total training time, and 80% fewer sampled problems than traditional SFT for the Qwen2.5-Math-1.5B baseline.
Label-free supervision: Uses KL divergence between expert and amateur outputs to generate soft contrastive labels, sidestepping human annotations and rejection sampling.
Three-stage pipeline: Critical step selection via KLD thresholding, contrastive supervision to capture the expert’s advantage, and self-distillation to internalize those strengths.
Generalization claims: Trained only on GSM8K, the authors report improvements across seven benchmarks including MATH, SVAMP, and OlympiadBench.
Pre-packaged artifacts: Ships pre-collected training samples (LRsamples) and ready-to-use fine-tuned models on Hugging Face.

Caveats

The method depends heavily on expert-amateur pairing; the README warns that the amateur must be “competent enough to produce coherent reasoning” yet weaker than the expert, and that the optimal gap is a “sweet spot” rather than simply a larger divide.
Default settings and hyperparameters are tuned for GSM8K; adapting to harder datasets like MATH may require upgrading the amateur model and retuning thresholds.
The specific accuracy and efficiency gains listed vary significantly by model size and benchmark.

Verdict

Worth a look if you are fine-tuning reasoning models and want to experiment with token-efficient alternatives to brute-force SFT. Probably overkill if you are not working with step-by-step reasoning tasks or lack the compute to run paired model inference for sampling.

Frequently asked

What is HKUDS/LightReasoner?: LightReasoner replaces exhaustive supervised fine-tuning with a small-model teacher that flags only the reasoning steps worth learning, slashing tuned tokens by 99%.
Is LightReasoner open source?: Yes — HKUDS/LightReasoner is open source, released under the MIT license.
What language is LightReasoner written in?: HKUDS/LightReasoner is primarily written in Python.
How popular is LightReasoner?: HKUDS/LightReasoner has 602 stars on GitHub.
Where can I find LightReasoner?: HKUDS/LightReasoner is on GitHub at https://github.com/HKUDS/LightReasoner.