← all repositories
scikit-learn-contrib/lightning

Scikit-learn's speed-obsessed cousin for linear models

A focused library that drops faster solvers into familiar scikit-learn-shaped molds when your data is big but your model stays linear.

1.8k stars Python ML Frameworks
lightning
Velocity · 7d
+0.3
★ / day
Trend
steady
star history

What it does

Lightning trains linear classifiers, regressors, and ranking models on large datasets. It speaks fluent scikit-learn — fit, predict, score — but swaps in specialized solvers like SDCA, SAGA, and SVRG that handle scale better than the stock options. Dense or sparse data, both native.

The interesting bit

The heavy lifting is Cython, not Python, which is the obvious-but-often-skipped step that actually makes “large-scale” claims credible. The group lasso multiclass example on News20 is a nice touch: it shows the library cares about structured sparsity, not just raw speed.

Key highlights

  • Solvers: primal/dual coordinate descent, SGD, AdaGrad, SAG, SAGA, SVRG, FISTA
  • Cython implementations for the expensive paths
  • Sparse and dense matrices without conversion drama
  • sklearn-contrib-lightning on pip/conda, precompiled binaries available
  • Python ≥ 3.7, depends on numpy, scipy, scikit-learn ≥ 0.19

Caveats

  • The README doesn’t mention GPU support; this is CPU-bound linear modeling
  • No benchmark numbers or speed comparisons against base scikit-learn — the “large-scale” claim is plausible but unquantified in the docs
  • Last Zenodo citation is 2016; check commit history if you need bleeding-edge maintenance

Verdict

Worth a look if you’re already in scikit-learn land and hitting walls with LogisticRegression or LinearSVC on big sparse data. Skip it if you need deep models, GPUs, or anything nonlinear — this is strictly linear methods, done fast.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.