← all repositories
marl/crepe

A neural network that listens for pitch, literally

CREPE runs a CNN directly on raw audio waveforms to estimate fundamental frequency, no spectrogram required.

1.4k stars Python Other AI
crepe
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

What it does CREPE is a monophonic pitch tracker: feed it a WAV file, get back timestamps, predicted fundamental frequency in Hz, and a confidence score for whether any pitch is present at all. It ships as a command-line tool and a Python module with a pre-trained model ready to go.

The interesting bit Instead of the usual spectral analysis, CREPE runs a deep convolutional network directly on the time-domain waveform. The authors claim it outperformed pYIN and SWIPE back in 2018. A neat post-paper tweak uses argmax-local weighted averaging—only the neighborhood around the peak activation contributes to the final pitch, which reportedly sharpens accuracy further.

Key highlights

  • Outputs CSV with 10 ms resolution by default; hop size is adjustable
  • Five model sizes from tiny to full for trading speed vs. accuracy
  • Optional Viterbi temporal smoothing
  • Can dump the full 360-bin activation matrix or a salience plot
  • Batch processing: point it at a folder of WAVs and walk away

Caveats

  • WAV files only; anything else gets rejected at the door
  • Trained on 16 kHz vocal and instrumental data, so your mileage may vary on other sources
  • Keras with TensorFlow backend is strongly recommended; the model was trained on TF 1.6.0 and Keras 2.1.5
  • GPU is “significantly faster”—the authors’ words, not a benchmark

Verdict Handy if you need pitch contours from monophonic audio and don’t want to hand-roll a pipeline. Skip it if you’re doing polyphonic transcription or need modern, actively maintained dependencies.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.