Is Steel-LLM open source?

Yes — zhanshijinwat/Steel-LLM is an open-source project tracked on heatdrop.

What language is Steel-LLM written in?

zhanshijinwat/Steel-LLM is primarily written in Jupyter Notebook.

How popular is Steel-LLM?

zhanshijinwat/Steel-LLM has 810 stars on GitHub.

Where can I find Steel-LLM?

zhanshijinwat/Steel-LLM is on GitHub at https://github.com/zhanshijinwat/Steel-LLM.

← all repositories

zhanshijinwat/Steel-LLM

Forging a 1B Chinese LLM from raw data, not checkpoints

One developer spent eight months pretraining a 1B-parameter Chinese-centric LLM on 1T tokens to prove solo replication is possible with a modest GPU cluster.

★810 stars Jupyter Notebook Language Models ML Frameworks

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

Steel-LLM is a fully documented, end-to-end recipe for pretraining a ~1.1B parameter Chinese language model starting from random weights. The project consumed 1T tokens across roughly 30 days on eight H800s (or 60 days on eight A100s) and released every checkpoint, data pipeline script, and architectural tweak under open licenses. It is explicitly designed as a reproducible reference for individuals or small labs with access to 8–dozens of GPUs who want to train rather than fine-tune.

The interesting bit

The author named the project after a Chinese rock band’s ethos of “primitive steelmaking”—achieving quality under resource constraints. True to that spirit, the model uses a modified Qwen1.5 backbone with a softmax MoE FFN and double-layer SwiGLU to squeeze training speed out of limited compute, and the entire journey—from data scraping to ICLR 2025 workshop acceptance—is chronicled in public blog posts.

Key highlights

Trained from scratch on 1T tokens with ~80% Chinese data, yielding a 1.12B-parameter base and chat-tuned variants.
Architecture tweaks: softmax MoE in the FFN and a double SwiGLU layer, aimed at faster training per parameter.
Scores 41.9 on CEVAL and 36.1 on CMMLU, outperforming some earlier, larger institutional models according to the project’s own comparisons.
Pretraining framework extends TinyLlama’s code with HuggingFace compatibility, resume-from-checkpoint data state, and mid-run dataset appending.
Full provenance: data sources include SkyPile, WanJuan, Chinese Wikipedia, Baidu Baike, Zhihu, BELLE, MOSS, and StarCoder, with processing pipelines published.

Caveats

The tokenizer is borrowed wholesale from Qwen1.5-MoE-A2.7B-Chat rather than trained from scratch.
English evaluation was explicitly deprioritized; do not expect balanced bilingual performance.
Reproduction still requires a serious cluster (eight high-end GPUs and ~4 TB storage), so it remains out of reach for hobbyist single-GPU setups.

Verdict

Grab the blogs and code if you are a researcher or engineer planning a small-scale pretraining run and need a battle-tested playbook. Skip it if you are looking for a drop-in replacement for Qwen or MiniCPM, or if you only have one GPU and a weekend.

Frequently asked

What is zhanshijinwat/Steel-LLM?: One developer spent eight months pretraining a 1B-parameter Chinese-centric LLM on 1T tokens to prove solo replication is possible with a modest GPU cluster.
Is Steel-LLM open source?: Yes — zhanshijinwat/Steel-LLM is an open-source project tracked on heatdrop.
What language is Steel-LLM written in?: zhanshijinwat/Steel-LLM is primarily written in Jupyter Notebook.
How popular is Steel-LLM?: zhanshijinwat/Steel-LLM has 810 stars on GitHub.
Where can I find Steel-LLM?: zhanshijinwat/Steel-LLM is on GitHub at https://github.com/zhanshijinwat/Steel-LLM.