Mocha.jl: a deep-learning framework that retired gracefully
An early Julia deep-learning framework, now deprecated, that helped prove the language could handle neural nets before modern autodiff existed.

What it does
Mocha.jl was a Caffe-inspired deep learning framework for Julia, offering modular layers, solvers, and multiple compute backends. It let you define convolutional nets via explicit layer objects—ConvolutionLayer, PoolingLayer, InnerProductLayer—and train them with SGD, saving snapshots to HDF5.
The interesting bit
The README opens with a deprecation notice, which is itself the story: Mocha launched in Julia’s pre-v1.0 era, before the language had native GPU kernels or general autodiff. The author acknowledges the codebase is now “excessively old and primitive” and points users to Flux.jl, Knet.jl, and wrappers like MXNet.jl. It’s a rare honest retirement rather than bit-rot.
Key highlights
- Three switchable backends: pure Julia (portable, JIT-compiled), a C++ native extension (~2–3× faster), and GPU via cuDNN/cuBLAS (20–30× speedup claimed for large models).
- HDF5 for datasets and model snapshots, plus tools to import Caffe-trained weights.
- Extensive unit-test coverage across all backends, per the README.
- “Coffee lounge” and “coffee breaks” for logging, snapshots, and validation—quirky naming for training callbacks.
Caveats
- Deprecated since December 2018; last supported Julia version is 0.6, with only a community PR for v1.0 CPU support.
- No autodiff: you manually wire forward and backward passes through explicit layer declarations.
- GPU backend relies on CUDA; no mention of AMD or Apple Silicon support.
Verdict
Worth a look if you’re writing a history of Julia’s ML ecosystem or maintaining legacy v0.6 code. Everyone else should follow the author’s own advice and use Flux.jl or Knet.jl instead.