Is audioset_tagging_cnn open source?

Yes — qiuqiangkong/audioset_tagging_cnn is open source, released under the MIT license.

What language is audioset_tagging_cnn written in?

qiuqiangkong/audioset_tagging_cnn is primarily written in Python.

How popular is audioset_tagging_cnn?

qiuqiangkong/audioset_tagging_cnn has 1.8k stars on GitHub.

Where can I find audioset_tagging_cnn?

qiuqiangkong/audioset_tagging_cnn is on GitHub at https://github.com/qiuqiangkong/audioset_tagging_cnn.

← all repositories

qiuqiangkong/audioset_tagging_cnn

Pretrained audio CNNs that actually beat Google's baseline

A set of CNNs trained on 5,000 hours of audio, ready to tag sounds or detect events without starting from scratch.

★1.8k stars Python Domain Apps Image · Video · Audio

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

This is the research code behind PANNs — a family of CNNs trained on Google’s AudioSet (527 sound classes, ~5,000 hours). You get pretrained weights for audio tagging and frame-wise sound event detection, plus scripts to fine-tune on your own task. The best model, Wavegram-Logmel-CNN, hits 0.439 mAP on AudioSet; the simpler Cnn14 gets 0.431. Both outperform the Google baseline of 0.317.

The interesting bit

The authors trained directly on raw audio waveforms rather than relying on Google’s provided embeddings, which was the common pattern at the time. They also ship a separate panns_inference package for users who just want pretrained models without the full training pipeline — a rare case of research code acknowledging that most people only want the weights.

Key highlights

Pretrained models available on Zenodo; inference works out of the box with pytorch/inference.py
Supports both clip-level tagging and frame-level sound event detection (DecisionLevelMax, Avg, Att variants)
Fine-tuning template included; demonstrated on GTZAN music classification
Training from scratch takes 3–7 days on a single V100 (works on 12 GB GPUs with reduced batch size)
Full reproducibility package: dataset download scripts, HDF5 packing for I/O speedup, and plotting utilities for all paper figures

Caveats

The AudioSet download is ~1.1 TB and hosted on YouTube, so expect missing files and geographic variability
Baidu Cloud link provided for the authors’ exact downloaded copy; no direct HTTP mirror
README is thorough but reads like a lab notebook — you’ll need to dig for the high-level picture

Verdict

Worth a look if you need off-the-shelf sound classification or event detection and don’t want to train on AudioSet yourself. Skip if you’re after real-time streaming or edge deployment — these are research models, not optimized inference engines.

Frequently asked

What is qiuqiangkong/audioset_tagging_cnn?: A set of CNNs trained on 5,000 hours of audio, ready to tag sounds or detect events without starting from scratch.
Is audioset_tagging_cnn open source?: Yes — qiuqiangkong/audioset_tagging_cnn is open source, released under the MIT license.
What language is audioset_tagging_cnn written in?: qiuqiangkong/audioset_tagging_cnn is primarily written in Python.
How popular is audioset_tagging_cnn?: qiuqiangkong/audioset_tagging_cnn has 1.8k stars on GitHub.
Where can I find audioset_tagging_cnn?: qiuqiangkong/audioset_tagging_cnn is on GitHub at https://github.com/qiuqiangkong/audioset_tagging_cnn.