Is CaffeOnSpark open source?

Yes — yahoo/CaffeOnSpark is open source, released under the Apache-2.0 license.

What language is CaffeOnSpark written in?

yahoo/CaffeOnSpark is primarily written in Jupyter Notebook.

How popular is CaffeOnSpark?

yahoo/CaffeOnSpark has 1.3k stars on GitHub.

Where can I find CaffeOnSpark?

yahoo/CaffeOnSpark is on GitHub at https://github.com/yahoo/CaffeOnSpark.

← all repositories

yahoo/CaffeOnSpark

Yahoo's deep learning bridge to Hadoop is archived, not forgotten

A 2016 attempt to run Caffe on Spark clusters without building a separate GPU farm.

★1.3k stars Jupyter Notebook ML Frameworks

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does CaffeOnSpark wraps the Caffe deep learning framework into a Spark package, letting you train neural networks on Hadoop clusters using HDFS-stored data. It supports training, testing, and feature extraction across GPU and CPU servers, with a Scala API for Spark applications.

The interesting bit The server-to-server direct communication over Ethernet or InfiniBand was the real architectural bet — it aimed to dodge the “separate deep learning cluster” tax that most organizations faced. Yahoo ran this in production for image search and content classification on their private cloud.

Key highlights

Reuses existing Caffe LMDB datasets and prototxt configs with minor tweaks
Spark 1.x and 2.x support (default: Spark 2.0.0, Hadoop 2.7.1, Scala 2.11.7)
Incremental learning from prior models or snapshots
Deployable on AWS EC2 or private cloud
Per-device batch sizes in prototxt files

Caveats

Archived and unsupported since 2016 — Yahoo explicitly notes they’re no longer maintaining it
Memory layers require "share_in_parallel: false" to avoid GPU sharing issues
Build versions are pinned in caffe-grid/pom.xml and likely stale

Verdict Worth reading if you’re studying how big tech bridged pre-TensorFlow deep learning onto existing data infrastructure. Skip it if you need something that runs today — this is a fossil, not a foundation.

Frequently asked

What is yahoo/CaffeOnSpark?: A 2016 attempt to run Caffe on Spark clusters without building a separate GPU farm.
Is CaffeOnSpark open source?: Yes — yahoo/CaffeOnSpark is open source, released under the Apache-2.0 license.
What language is CaffeOnSpark written in?: yahoo/CaffeOnSpark is primarily written in Jupyter Notebook.
How popular is CaffeOnSpark?: yahoo/CaffeOnSpark has 1.3k stars on GitHub.
Where can I find CaffeOnSpark?: yahoo/CaffeOnSpark is on GitHub at https://github.com/yahoo/CaffeOnSpark.