Stable diffusion, but make it a spectrograph: the web UI for AI music
A Next.js frontend that turns image-generation pipelines into real-time audio by treating spectrograms as pictures.

What it does Riffusion is a web interface for generating music via stable diffusion. The trick: it treats audio spectrograms as images, so a standard image-diffusion model can iteratively denoise its way to a sound. This repo is the React/Next.js frontend — the pretty face that sends inference requests to a separate Flask backend running the actual model.
The interesting bit The project treats frequency as pixels and time as width, letting an off-the-shelf latent diffusion model hallucinate music without ever seeing a MIDI file. The frontend adds three.js visualizations, presumably so your generative audio has generative eye candy to match.
Key highlights
- Built with Next.js, TypeScript, Tailwind, and three.js — standard modern web stack
- Requires a separate inference server with a “large GPU” capable of sub-five-second stable diffusion runs
- Needs a
.env.localpointing toRIFFUSION_FLASK_URL— no backend, no music - Includes an about page and API routes in the usual Next.js pattern
- Published in 2022 with an academic citation ready to copy-paste
Caveats
- :no_entry: No longer actively maintained — the README leads with this warning
- The README is thin on architecture details; it’s unclear how the spectrogram-to-audio conversion happens on the frontend, or if it happens at all there
- “Large GPU” requirement is vague — no specific VRAM or model checkpoint guidance
Verdict Worth a look if you’re studying diffusion-for-audio interfaces or want a reference Next.js project that talks to a Python inference backend. Skip it if you need a turnkey music generator; this is a frontend-only repo for a discontinued experiment, and you’ll be doing backend surgery yourself.