← all repositories
intel/BigDL

Intel's kitchen-sink deep-learning stack for Spark clusters

BigDL bundles half a dozen libraries that try to bridge single-node PyTorch/TensorFlow code with distributed Spark and Ray execution—plus a few side quests.

BigDL
Velocity · 7d
+0.8
★ / day
Trend
steady
star history

What it does BigDL is a collection of Intel libraries that wrap TensorFlow, PyTorch, and Keras so they can run distributed across Spark, Flink, or Ray clusters. The main attraction is Orca, which scales single-node training scripts to YARN or Kubernetes with an init_orca_context() call and an Estimator wrapper. Nano auto-tunes inference and training on Intel CPUs/GPUs (IPEX, BF16, ONNX Runtime, OpenVINO) and spits out a latency-vs-accuracy comparison table. DLlib offers a Keras-style API inside Spark DataFrame/ML Pipeline programs for Scala holdouts. There are also specialized libraries for time series (Chronos), recommendations (Friesian), and confidential computing (PPML with SGX/TDX).

The interesting bit The RayOnSpark feature lets you mix Spark DataFrames and Ray Datasets in the same Python script—handy if your pipeline is stuck between two eras of big-data infrastructure. Nano’s InferenceOptimizer brute-forces through a dozen optimization methods and presents a leaderboard; the “optimization cost 60.8s” note in the README example is an unusually honest breadcrumb.

Key highlights

  • Orca: Distributed TF/PyTorch/OpenVINO on Spark or Ray clusters with minimal code changes.
  • Nano: Transparent Intel-specific acceleration; claims up to 10× speedup on standard frameworks.
  • DLlib: Deep learning inside Spark MLlib-style pipelines, with Scala and Python APIs.
  • PPML: Hardware-isolated execution via Intel SGX/TDX for regulated data.
  • Modular installs: pip install bigdl-chronos, bigdl-nano, etc.—you don’t need the whole monolith.

Caveats

  • The LLM library is deprecated; Intel has moved LLM work to the separate IPEX-LLM project, so don’t start new work there.
  • The README’s “seamlessly” count is high, and the actual friction of mixing Spark, Ray, and framework-native code is left as an exercise to the reader.
  • Several links in the decision-tree diagram point to a BigDL-2.x repo, suggesting the documentation hasn’t fully caught up with the current release structure.

Verdict Worth a look if you’re already committed to Spark or Ray clusters and want to avoid rewriting PyTorch/TensorFlow code for distributed execution. Skip it if you’re cloud-native on pure Kubernetes with GPUs, or if you need bleeding-edge LLM tooling—Intel has already waved you toward IPEX-LLM for that.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.