Is BigDL open source?

Yes — intel/BigDL is open source, released under the Apache-2.0 license.

What language is BigDL written in?

intel/BigDL is primarily written in Jupyter Notebook.

How popular is BigDL?

intel/BigDL has 2.7k stars on GitHub.

Where can I find BigDL?

intel/BigDL is on GitHub at https://github.com/intel/BigDL.

← all repositories

intel/BigDL

Intel's kitchen-sink deep-learning stack for Spark clusters

BigDL bundles half a dozen libraries that try to bridge single-node PyTorch/TensorFlow code with distributed Spark and Ray execution—plus a few side quests.

★2.7k stars Jupyter Notebook ML Frameworks Inference · Serving LLMOps · Eval

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does BigDL is a collection of Intel libraries that wrap TensorFlow, PyTorch, and Keras so they can run distributed across Spark, Flink, or Ray clusters. The main attraction is Orca, which scales single-node training scripts to YARN or Kubernetes with an init_orca_context() call and an Estimator wrapper. Nano auto-tunes inference and training on Intel CPUs/GPUs (IPEX, BF16, ONNX Runtime, OpenVINO) and spits out a latency-vs-accuracy comparison table. DLlib offers a Keras-style API inside Spark DataFrame/ML Pipeline programs for Scala holdouts. There are also specialized libraries for time series (Chronos), recommendations (Friesian), and confidential computing (PPML with SGX/TDX).

The interesting bit The RayOnSpark feature lets you mix Spark DataFrames and Ray Datasets in the same Python script—handy if your pipeline is stuck between two eras of big-data infrastructure. Nano’s InferenceOptimizer brute-forces through a dozen optimization methods and presents a leaderboard; the “optimization cost 60.8s” note in the README example is an unusually honest breadcrumb.

Key highlights

Orca: Distributed TF/PyTorch/OpenVINO on Spark or Ray clusters with minimal code changes.
Nano: Transparent Intel-specific acceleration; claims up to 10× speedup on standard frameworks.
DLlib: Deep learning inside Spark MLlib-style pipelines, with Scala and Python APIs.
PPML: Hardware-isolated execution via Intel SGX/TDX for regulated data.
Modular installs: pip install bigdl-chronos, bigdl-nano, etc.—you don’t need the whole monolith.

Caveats

The LLM library is deprecated; Intel has moved LLM work to the separate IPEX-LLM project, so don’t start new work there.
The README’s “seamlessly” count is high, and the actual friction of mixing Spark, Ray, and framework-native code is left as an exercise to the reader.
Several links in the decision-tree diagram point to a BigDL-2.x repo, suggesting the documentation hasn’t fully caught up with the current release structure.

Verdict Worth a look if you’re already committed to Spark or Ray clusters and want to avoid rewriting PyTorch/TensorFlow code for distributed execution. Skip it if you’re cloud-native on pure Kubernetes with GPUs, or if you need bleeding-edge LLM tooling—Intel has already waved you toward IPEX-LLM for that.

Frequently asked

What is intel/BigDL?: BigDL bundles half a dozen libraries that try to bridge single-node PyTorch/TensorFlow code with distributed Spark and Ray execution—plus a few side quests.
Is BigDL open source?: Yes — intel/BigDL is open source, released under the Apache-2.0 license.
What language is BigDL written in?: intel/BigDL is primarily written in Jupyter Notebook.
How popular is BigDL?: intel/BigDL has 2.7k stars on GitHub.
Where can I find BigDL?: intel/BigDL is on GitHub at https://github.com/intel/BigDL.