Teaching neural networks to care about angles, not just distances
A 2017 CVPR paper that made face embeddings live on a hypersphere instead of spread across Euclidean space.

What it does
SphereFace is a complete face recognition pipeline—detection, alignment, and recognition—built around a simple geometric insight. Instead of letting a CNN learn embeddings anywhere in space, it constrains them to the surface of a hypersphere and optimizes angular margins between identities. The repo provides a Caffe-based implementation with a 20-layer residual architecture, training scripts for CASIA-WebFace, and evaluation on LFW.
The interesting bit
The trick is A-Softmax loss: it adds an angular margin penalty that forces the network to separate faces by angle rather than raw distance. The authors also stripped out BatchNorm and replaced ReLU with PReLU—unusual choices that they found worked better for this particular geometric constraint. They once topped the MegaFace small-training-set leaderboard with this setup.
Key highlights
- Full preprocessing pipeline included: MTCNN detection, similarity-transform alignment, feature extraction
- Achieves ~99.3% mean accuracy on LFW (10-fold cross-validation) in the authors’ five runs
- Provides 20-layer architecture; 4/36/64-layer variants exist in paper/prototxt
- Pre-trained model and LFW features available via Google Drive/Baidu
- Video demo shows open-set recognition on Friends characters frame-by-frame
Caveats
- Built for Caffe and MATLAB; the authors now point to a PyTorch reimplementation at opensphere.world for easier use
- Gradient computation deviates from the paper: authors normalize gradient scale for stabler convergence, which they note is a practical hack rather than a theoretical match
Verdict
Worth studying if you’re implementing angular-margin losses or need a reference for geometric embedding constraints. Skip if you want a modern, plug-and-play face recognizer—this is a research artifact from the Caffe era, not a product.