Is audio-super-res open source?

Yes — kuleshov/audio-super-res is open source, released under the MIT license.

What language is audio-super-res written in?

kuleshov/audio-super-res is primarily written in Python.

How popular is audio-super-res?

kuleshov/audio-super-res has 1.3k stars on GitHub.

Where can I find audio-super-res?

kuleshov/audio-super-res is on GitHub at https://github.com/kuleshov/audio-super-res.

← all repositories

kuleshov/audio-super-res

Neural networks that hallucinate missing audio frequencies

A research implementation for upsampling low-resolution audio using temporal feature-wise modulation, with a Keras layer you can steal for other time-series work.

★1.3k stars Python Image · Video · Audio

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

This repo trains neural networks to reconstruct high-resolution audio from downsampled inputs—essentially teaching a model to guess the frequencies that were thrown away. It ships with data pipelines for the VCTK speech corpus, training scripts for single- or multi-speaker datasets, and a pre-trained checkpoint for speaker #1. The run.py script handles both training and inference, spitting out side-by-side low-res, high-res, and “predicted” WAV files.

The interesting bit

The authors’ Temporal FiLM (Feature-wise Linear Modulation) layer is the real takeaway: it captures long-range dependencies in sequences by modulating features across time, not just depth. They’ve packaged it as a standalone Keras layer (keras_layer.py), and the same architecture has been repurposed for EEG denoising and functional genomics imputation. The audio task is essentially a demo.

Key highlights

Four model variants: audiounet, audiotfilm (the authors’ pick for best), dnn, and a cubic-spline baseline
Single-speaker training takes “a few hours”; multi-speaker needs “several days”
Pre-trained single-speaker model available via Google Drive link
Input length must be a multiple of 2**layers; the model will silently clip your audio if it isn’t
Includes a grocery-sales imputation experiment, because why not

Caveats

The authors explicitly warn that the codebase “has not been fully tested” after a recent TensorFlow/Keras upgrade
Performance is highly sensitive to how you generate low-res training data—Butterworth vs. Chebyshev low-pass filters matter, and aliased input (no filter) actually sounds better despite worse objective metrics
Applying this to your own voice requires collecting matching labeled examples; the pre-trained model is speaker-specific

Verdict

Worth a look if you need a proven time-series upsampling architecture you can adapt, or if you’re curious about FiLM-like conditioning for sequences. Skip it if you want a polished, drop-in audio enhancer—this is research code with sharp edges.

Frequently asked

What is kuleshov/audio-super-res?: A research implementation for upsampling low-resolution audio using temporal feature-wise modulation, with a Keras layer you can steal for other time-series work.
Is audio-super-res open source?: Yes — kuleshov/audio-super-res is open source, released under the MIT license.
What language is audio-super-res written in?: kuleshov/audio-super-res is primarily written in Python.
How popular is audio-super-res?: kuleshov/audio-super-res has 1.3k stars on GitHub.
Where can I find audio-super-res?: kuleshov/audio-super-res is on GitHub at https://github.com/kuleshov/audio-super-res.