← all repositories
mravanelli/SincNet

Neural networks that learn frequency, not filter coefficients

SincNet replaces CNN filter learning with tunable band-pass filters derived from sinc functions, cutting parameters while keeping interpretability.

1.2k stars Python Domain Apps
SincNet
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

What it does SincNet is a CNN for raw audio that replaces the first convolutional layer’s free-form filters with parametrized sinc functions — essentially learnable band-pass filters. Instead of learning every tap of a filter kernel, the network only learns low and high cutoff frequencies. The repo provides a full speaker-identification pipeline built on this idea, with a TIMIT example and training utilities.

The interesting bit The insight is old-school signal processing dressed in deep-learning clothes: by constraining the filter shape to sinc functions, you bake in the prior that audio analysis needs frequency-selective filters. The result is a compact, interpretable filter bank that the authors call “customized” to the task — fewer parameters, less overfitting risk, and filters you can actually inspect.

Key highlights

  • First layer learns only 2 parameters per filter (cutoff frequencies) vs. hundreds in standard CNNs
  • Includes complete TIMIT speaker-ID experiment with config-driven training
  • SincConv_fast implementation is 50% faster than the original
  • Also integrated into the broader SpeechBrain and PyTorch-Kaldi projects
  • Trained TIMIT model available for download

Caveats

  • Code is explicitly a “showcase” — the authors note speed optimizations are missing and I/O is not cluster-friendly without local data copying
  • Training on a TITAN X took ~24 hours; convergence slows and oscillates after epoch 30
  • Librispeech version used in the paper is “available upon request,” not bundled

Verdict Worth studying if you care about interpretable inductive biases in audio networks or need a pedagogical example of hybrid DSP/deep learning. For production speaker recognition, the authors themselves point to SpeechBrain or PyTorch-Kaldi instead.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.