← all repositories
shibing624/pytextclassifier

A text-classifier buffet: from logistic regression to BERT

One Python toolkit that wraps classical ML, deep learning, and transformers behind a uniform API so you can swap algorithms without rewriting plumbing.

523 stars Python ML FrameworksLanguage Models
pytextclassifier
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

What it does

pytextclassifier is a Python toolkit that trains and runs text classifiers across a wide range of algorithms—logistic regression, random forest, XGBoost, SVM, TextCNN, TextRNN, FastText, and BERT variants—through a consistent interface. It handles binary, multi-class, multi-label, and hierarchical classification, plus K-means clustering, for both Chinese and English text.

The interesting bit

The value is in the boring part: the API stays the same whether you’re calling a sklearn logistic regression or a GPU-hungry BERT model. The README shows identical train(), predict(), and evaluate_model() patterns across all backends, which means you can benchmark a cheap baseline against a transformer without rewriting data pipelines.

Key highlights

  • Broad algorithm coverage: 11 classifiers from classical ML to deep learning and transformers (BERT, ALBERT, RoBERTa, XLNet)
  • Unified interface: ClassicClassifier, FastTextClassifier, BertClassifier, etc. all expose the same core methods
  • Chinese-first but bilingual: examples and stopword handling for both Chinese and English corpora
  • Feature inspection: built-in eli5 integration to visualize feature weights for interpretable models
  • Lazy model loading: models load on demand rather than at import time

Caveats

  • Documentation is sparse on performance numbers, hardware requirements, or how well each model scales; you’ll need to benchmark yourself
  • The deep-learning examples show toy datasets with perfect accuracy—real-world behavior is unclear from the README

Verdict

Worth a look if you need to prototype text classifiers fast across multiple algorithm families, especially for Chinese text. Skip it if you want a single SOTA model with heavy optimization; this is a breadth-over-depth toolbox.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.