Scikit-learn's speed-obsessed cousin for linear models
A focused library that drops faster solvers into familiar scikit-learn-shaped molds when your data is big but your model stays linear.

What it does
Lightning trains linear classifiers, regressors, and ranking models on large datasets. It speaks fluent scikit-learn — fit, predict, score — but swaps in specialized solvers like SDCA, SAGA, and SVRG that handle scale better than the stock options. Dense or sparse data, both native.
The interesting bit
The heavy lifting is Cython, not Python, which is the obvious-but-often-skipped step that actually makes “large-scale” claims credible. The group lasso multiclass example on News20 is a nice touch: it shows the library cares about structured sparsity, not just raw speed.
Key highlights
- Solvers: primal/dual coordinate descent, SGD, AdaGrad, SAG, SAGA, SVRG, FISTA
- Cython implementations for the expensive paths
- Sparse and dense matrices without conversion drama
sklearn-contrib-lightningon pip/conda, precompiled binaries available- Python ≥ 3.7, depends on numpy, scipy, scikit-learn ≥ 0.19
Caveats
- The README doesn’t mention GPU support; this is CPU-bound linear modeling
- No benchmark numbers or speed comparisons against base scikit-learn — the “large-scale” claim is plausible but unquantified in the docs
- Last Zenodo citation is 2016; check commit history if you need bleeding-edge maintenance
Verdict
Worth a look if you’re already in scikit-learn land and hitting walls with LogisticRegression or LinearSVC on big sparse data. Skip it if you need deep models, GPUs, or anything nonlinear — this is strictly linear methods, done fast.