A grad student, 1% of a million songs, and a CNN walk into a bar
A 2016 Tel Aviv University project that swaps Tao Feng's RBM for a TensorFlow CNN and scrapes 30-second previews to classify ten music genres.

What it does
Trains a convolutional neural network to classify 10-second audio clips into one of ten genres (blues, classical, country, disco, hiphop, jazz, metal, pop, reggae, rock). The pipeline downloads 30-second song previews via the 7Digital API, converts them to mel-frequency spectrograms using librosa, and feeds a 3-layer CNN with max pooling and softmax output.
The interesting bit
The author couldn’t find a clean labeled dataset, so he reverse-engineered one: the Million Song Dataset provides metadata and 7Digital IDs, and 7Digital’s “preview before you buy” feature becomes a free data spigot. He also discovered that raw mel-frequencies (step 2 of the MFCC pipeline) outperform full MFCCs by “extremely better” margins, at the cost of longer training — a finding backed by t-SNE visualizations showing cleaner genre clustering.
Key highlights
- Built on TensorFlow in 2016, explicitly framed as a learning exercise for the then-new framework
- CNN architecture with 3 hidden layers and max pooling, inspired by Sander Dieleman’s Spotify blog post on deep content-based recommendation
- Dataset construction via
previewDownloader.pyscraping 7Digital previews for ~1% of the Million Song Dataset (roughly 2.8GB due to laptop constraints) - Preprocessing scripts for MFCC and mel-spectrogram extraction, t-SNE visualization, and input formatting all included
- Results published for full 10-class classification, compared against Tao Feng’s 2/3/4-class RBM results and a non-deep-learning benchmark
Caveats
- The README contains no actual accuracy numbers in text; results are only visible in the
results_mine.pngimage, so precise performance is unclear without viewing it - Dataset download link is a Dropbox URL of uncertain longevity
- Spelling inconsistencies (“nural_network.png”, “GenereClassification”) suggest limited maintenance since original publication
Verdict
Worth a look if you’re teaching or learning classic audio CNN pipelines, or if you need a reference for scraping creative datasets from commercial APIs. Skip it if you want a maintained, production-ready classifier — this is academic coursework from the TensorFlow 0.x era, not a library.