Three neural nets walk into a bar (and classify it)
A 2016 tutorial repo that still runs on TensorFlow 2.x, comparing how feedforward, convolutional, and recurrent networks handle city noise.

What it does
This repo contains three Jupyter notebooks that train neural networks to classify urban sounds—sirens, jackhammers, dog barks, etc.—using the UrbanSound8K dataset. Each notebook implements a different architecture: a vanilla feedforward net, a CNN, and an RNN (LSTM). The code is paired with blog posts for the first two, making it a walkthrough-style resource rather than a production system.
The interesting bit
The value is in the side-by-side comparison. You can watch the same audio spectrograms get flattened, convolved, and sequenced through three different inductive biases, with the CNN and RNN notebooks showing how spatial and temporal structure each get exploited. It’s a teaching tool for understanding why architecture choice matters for audio, not just that it does.
Key highlights
- Three complete, runnable notebooks (feedforward → CNN → RNN progression)
- Accompanying blog posts explain the feedforward and CNN implementations step-by-step
- Updated to TensorFlow 2.x from the original 1.x codebase
- Uses librosa for standard audio preprocessing (spectrograms, MFCCs)
- Also references Google’s AudioSet as a follow-up dataset for scale
Caveats
- The RNN notebook has no linked blog post, so you’re on your own for explanation
- No training curves, accuracy numbers, or model comparison table in the README—you’ll need to run the notebooks to see performance
- Last substantive update appears to be the TF 2.x migration; no modern architectures (transformers, wav2vec-style approaches)
Verdict
Good for students or practitioners who want to feel the architectural differences in audio classification before reaching for pre-trained models. Skip it if you need a production pipeline or state-of-the-art results; the field has moved on.