Is PyTorchText open source?

Yes — chenyuntc/PyTorchText is open source, released under the MIT license.

What language is PyTorchText written in?

chenyuntc/PyTorchText is primarily written in Python.

How popular is PyTorchText?

chenyuntc/PyTorchText has 1.1k stars on GitHub.

Where can I find PyTorchText?

chenyuntc/PyTorchText is on GitHub at https://github.com/chenyuntc/PyTorchText.

← all repositories

chenyuntc/PyTorchText

How to win a Chinese NLP competition: throw every model at it

A 2017 competition-winning repo that ensembles CNNs, LSTMs, RCNNs, and even FastText to classify Zhihu questions.

★1.1k stars Python ML Frameworks Language Models

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does This is the first-place solution from the 2017 Zhihu Machine Learning Challenge (963 teams). It classifies Chinese questions into topics using a battery of neural text models, then ensembles their predictions. The README is essentially a training manual: data preprocessing scripts, exact shell commands for each model variant, and a scoreboard showing what each architecture achieved.

The interesting bit The winning insight isn’t architectural novelty—it’s systematic brute force. The authors trained separate word-level and character-level versions of every model, tried data augmentation for each, and ensembled the survivors to push from ~0.41 to 0.433. They even include a del/ directory of failed methods, which is more honest than most competition write-ups.

Key highlights

Five model families: CNN, LSTM, RCNN, Inception-style CNN, and FastText
Both word and character embeddings, with augmentation toggles
Published score table: LSTM_word_aug hits 0.41368, ensemble reaches 0.433
Preprocessing requires >32GB RAM and uses tf.contrib.keras despite being a PyTorch project
Pretrained models hosted on Baidu Pan (password: tayb)

Caveats

Python 2 and PyTorch 0.x era; setup instructions mention CUDA without specifying version
Data paths are hardcoded (“modify the data path in the related file”)
Pretrained weights live on Baidu Pan with no mirror; reproducibility depends on Chinese cloud storage

Verdict Worth studying if you’re building an ensemble pipeline or working on legacy Chinese NLP benchmarks. Skip it if you need modern PyTorch, clean abstractions, or a library you can pip install—this is glue code that happened to win.

Frequently asked

What is chenyuntc/PyTorchText?: A 2017 competition-winning repo that ensembles CNNs, LSTMs, RCNNs, and even FastText to classify Zhihu questions.
Is PyTorchText open source?: Yes — chenyuntc/PyTorchText is open source, released under the MIT license.
What language is PyTorchText written in?: chenyuntc/PyTorchText is primarily written in Python.
How popular is PyTorchText?: chenyuntc/PyTorchText has 1.1k stars on GitHub.
Where can I find PyTorchText?: chenyuntc/PyTorchText is on GitHub at https://github.com/chenyuntc/PyTorchText.