← all repositories
combust/mleap

Ditch the Spark cluster for model serving

MLeap serializes ML pipelines from Spark and Scikit-learn into a lightweight JVM runtime that runs without their heavy dependencies.

mleap
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

What it does MLeap is a Scala-based execution engine and serialization format for machine learning pipelines. You train models in Spark, PySpark, or Scikit-learn, export them to a portable Bundle.ML format, then run inference via the MLeap runtime—no SparkContext, no numpy, no pandas required.

The interesting bit The project treats “training environment ≠ serving environment” as a first-class concern. It provides parity tests to ensure Spark and MLeap transformers behave identically, and supports both JSON and Protobuf serialization. The Scikit-learn integration is notably glue-like: it wraps sklearn components with MLeap-specific pipeline classes to make them serializable.

Key highlights

  • Core runtime is JVM/Scala; supports Spark 4.0.1 down to 2.4.5, Python 3.9–3.13
  • Serializes to JSON or Protobuf; executes without Spark or sklearn dependencies
  • Custom transformers and data types can be implemented for use across all supported frameworks
  • Optional Spark transformer extensions beyond the default MLlib offerings
  • Extensive test coverage with full parity tests between Spark and MLeap pipelines

Caveats

  • The README’s “blazing fast speeds” claim is mentioned but not substantiated with benchmarks
  • Scikit-learn integration requires using MLeap’s wrapper classes (mleap.sklearn.pipeline.Pipeline, mlinit()) rather than native sklearn APIs
  • Version compatibility matrix is extensive but complex; Java 8 through 17, multiple Scala versions, and tight Spark version coupling

Verdict Worth a look if you’re serving Spark ML models and tired of dragging a cluster into production. Less compelling if you’re already happy with ONNX, TensorFlow Serving, or a pure-Python stack.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.