← all repositories

cvondrick/soundnet

SoundNet learns sound representations from unlabeled video using a student-teacher training procedure that transfers visual model knowledge into the audio modality.

soundnet
Velocity · 7d
+0.1
★ / day
Trend
steady
star history

This research project from MIT/CSAIL learns rich natural sound representations by leveraging natural synchronization between vision and sound across two million unlabeled videos. The system uses a teacher-student training approach where discriminative visual knowledge from ImageNet and PlacesCNN models is transferred into sound representations. The 8-layer convolutional neural network is trained end-to-end to predict visual object and scene categories from audio, effectively learning audio features without explicit audio labels. Pre-trained models are provided for feature extraction and category recognition tasks.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.