Meta's audio lab: a full stack for AI-generated sound
A PyTorch toolkit that bundles neural audio codecs, text-to-music models, and training pipelines for researchers who want to generate or compress audio with deep learning.

What it does AudioCraft is Meta’s PyTorch library for audio generation research. It packages inference and training code for several models: MusicGen and JASCO for text-to-music, AudioGen for text-to-sound effects, EnCodec as a neural audio codec, plus Multi Band Diffusion, MAGNeT, and AudioSeal for watermarking. Everything is wired together with shared components and configurable training pipelines.
The interesting bit The library treats audio generation as a stack rather than a grab bag. EnCodec compresses audio to discrete tokens; MusicGen and friends generate those tokens from text (or chords, or melodies); Multi Band Diffusion decodes them back to waveforms. You can use the pieces separately or retrain the whole pipeline. The README notes training code is available for EnCodec, MusicGen, Multi Band Diffusion, and JASCO specifically.
Key highlights
- MusicGen: text-to-music with optional melodic conditioning (hum a tune, get an arrangement)
- JASCO: adds chord, melody, and drum track conditioning for finer control
- EnCodec: neural codec at the center, tokenizing audio for generation models
- AudioSeal: built-in watermarking for generated audio
- Training pipelines: not just inference; includes configs and grids for reproducing papers
Caveats
- Model weights are CC-BY-NC 4.0 (non-commercial), while the code itself is MIT — check your use case
- Requires Python 3.9 and PyTorch 2.1.0 exactly; the install instructions warn about xformers compatibility
ffmpegdependency, with a specific<5constraint if using conda
Verdict Worth a look if you’re doing research in neural audio generation or need a controllable music LM to build on. Skip it if you want a simple API for casual music generation — the value here is in the training code and composable pieces, not a polished end-user product.