A text-classifier buffet: from logistic regression to BERT
One Python toolkit that wraps classical ML, deep learning, and transformers behind a uniform API so you can swap algorithms without rewriting plumbing.

What it does
pytextclassifier is a Python toolkit that trains and runs text classifiers across a wide range of algorithms—logistic regression, random forest, XGBoost, SVM, TextCNN, TextRNN, FastText, and BERT variants—through a consistent interface. It handles binary, multi-class, multi-label, and hierarchical classification, plus K-means clustering, for both Chinese and English text.
The interesting bit
The value is in the boring part: the API stays the same whether you’re calling a sklearn logistic regression or a GPU-hungry BERT model. The README shows identical train(), predict(), and evaluate_model() patterns across all backends, which means you can benchmark a cheap baseline against a transformer without rewriting data pipelines.
Key highlights
- Broad algorithm coverage: 11 classifiers from classical ML to deep learning and transformers (BERT, ALBERT, RoBERTa, XLNet)
- Unified interface:
ClassicClassifier,FastTextClassifier,BertClassifier, etc. all expose the same core methods - Chinese-first but bilingual: examples and stopword handling for both Chinese and English corpora
- Feature inspection: built-in eli5 integration to visualize feature weights for interpretable models
- Lazy model loading: models load on demand rather than at import time
Caveats
- Documentation is sparse on performance numbers, hardware requirements, or how well each model scales; you’ll need to benchmark yourself
- The deep-learning examples show toy datasets with perfect accuracy—real-world behavior is unclear from the README
Verdict
Worth a look if you need to prototype text classifiers fast across multiple algorithm families, especially for Chinese text. Skip it if you want a single SOTA model with heavy optimization; this is a breadth-over-depth toolbox.