5,700 GPU hours to automate osu! mapping
An AI framework that generates complete, playable beatmaps from raw audio spectrograms for all four osu! gamemodes.

What it does
Mapperatorinator ingests a song’s spectrogram and outputs a complete osu! beatmap—hit objects, timing, sliders, hitsounds, the works—for any of the four gamemodes (standard, taiko, catch-the-beat, mania). It also doubles as an AI modding assistant called MaiMod that flags snapping errors, timing drift, and inconsistent slider shapes by comparing human maps against its own predictions.
The interesting bit
The project fuses two prior models (osuT5 and osu-diffusion) into a multi-model pipeline, then adds a thick layer of controllability: you can dial in a target star rating, simulate a specific mapper’s style by user ID, pick an upload year to match period aesthetics, and even guide generation with positive or negative descriptor tags. The author trained this on roughly 5,700 hours of GPU time across a 4060 Ti and rented 4090s—serious hardware for a rhythm-game niche.
Key highlights
- Generates full beatmaps from audio alone, or remixes existing maps via in-context arguments (timing, kiai sections, guest difficulties)
- Web UI and interactive CLI both ship with the repo; Colab notebooks for zero-install tries
- Style control via
mapper_id,year,descriptors, and classifier-free guidance withnegative_descriptors - Partial remapping supported: replace a time range in an existing beatmap instead of generating from scratch
- MaiMod detects issues that rule-based tools miss, like weird slider shapes or inconsistent hitsound volumes
Caveats
- Requires Python 3.10, CUDA 13.0 or ROCm, and careful PyTorch GPU setup—dependency stack is finicky
- Some features (guest difficulties, hitsound generation) are locked to the older V31 model
- The README warns that mismatched song style and requested difficulty can cause the model to ignore your directions
Verdict
Worth a look if you map osu! beatmaps at any volume, or if you’re researching conditional generation in structured sequence domains. Skip it if you just want a quick drag-and-drop tool; the setup is real and the model still needs human curation to produce rankable output.