Is dist-keras open source?

Yes — cerndb/dist-keras is open source, released under the GPL-3.0 license.

What language is dist-keras written in?

cerndb/dist-keras is primarily written in Python.

How popular is dist-keras?

cerndb/dist-keras has 622 stars on GitHub.

Where can I find dist-keras?

cerndb/dist-keras is on GitHub at https://github.com/cerndb/dist-keras.

← all repositories

cerndb/dist-keras

Keras on Spark: a physics lab's take on distributed training

CERN's dist-keras wraps Keras models in Apache Spark to run data-parallel deep learning across clusters, with a research-friendly focus on pluggable distributed optimizers.

★622 stars Python ML Frameworks Inference · Serving

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does

dist-keras lets you train Keras models on Apache Spark clusters using data-parallel methods: multiple model replicas work on shards of data, periodically synchronizing parameters. It bundles several distributed optimizers—DOWNPOUR, EASGD variants, model averaging, ensemble training—and wraps them in Spark-friendly Python classes you instantiate like regular Keras trainers.

The interesting bit

The project treats distributed optimizers as swappable research primitives. The author implemented custom methods like ADAG (a less hyperparameter-sensitive DOWNPOUR variant) and DynSGD (which adapts learning rates per-worker based on parameter staleness) directly from recent academic work. There’s even a lightweight “Punchcard” job server for remote cluster submission via HTTP—handy if your dev machine isn’t your compute cluster.

Key highlights

Ships with 7+ distributed training strategies, from basic model averaging to asynchronous elastic averaging SGD
ADAG is flagged as “currently recommended” by the authors based on their own experiments
Includes remote job deployment through a secret-token-based Punchcard server
Ensemble training trains n full models in parallel, then averages predictions
CERN IT-DB origin; comes with a BibTeX citation block for academic use

Caveats

Python 3 compatibility is listed as a known issue
README warns that adding more asynchronous workers can hurt statistical performance (“implicit momentum” claims are noted but flagged as needing more research)
Several TODOs remain open: HDFS model save/load, network compression, multi-parameter-server support

Verdict

Worth a look if you’re already on Spark and want to experiment with distributed training algorithms without writing parameter servers from scratch. Skip it if you need Python 3, modern Keras/TensorFlow 2.x, or production-grade fault tolerance—the project appears research-oriented and somewhat dormant.

Frequently asked

What is cerndb/dist-keras?: CERN's dist-keras wraps Keras models in Apache Spark to run data-parallel deep learning across clusters, with a research-friendly focus on pluggable distributed optimizers.
Is dist-keras open source?: Yes — cerndb/dist-keras is open source, released under the GPL-3.0 license.
What language is dist-keras written in?: cerndb/dist-keras is primarily written in Python.
How popular is dist-keras?: cerndb/dist-keras has 622 stars on GitHub.
Where can I find dist-keras?: cerndb/dist-keras is on GitHub at https://github.com/cerndb/dist-keras.