Blog AI 9 min read

MusicGen (Meta) — Gamepad as a Live MusicGen Pilot

Drive Meta's MusicGen from a PS5 DualSense — face buttons swap prompts, sticks ride temperature and top-k, triggers fire generation. Live AI music pilot rig.

By Aidxn Design

MusicGen gamepad control sounds like a tech demo, but in practice it solves a real performance problem: how do you steer a slow generative model fast enough to feel like an instrument? The honest answer is that you cannot — MusicGen takes seconds, not milliseconds. What you can do is treat the gamepad as the pilot for a queue of prompts, parameters, and patches, so the human is always one step ahead of the model. This guide wires Meta's MusicGen into a PS5 DualSense via Universal Controller MIDI and a thirty-line script.

TL;DR
  • What you build: a gamepad-driven MusicGen rig where face buttons swap prompts, sticks ride temperature and top-k, triggers fire generation.
  • Stack: Universal Controller MIDI → small Node/Python script → Replicate or local Audiocraft → WAV into a DAW.
  • Latency reality: 6–12 seconds per generation on a 4090, 4–8 seconds via Replicate. Plan your set around the gap.
  • Best use: ambient interstitials, transitions, between-track filler. Not lead.

Why a gamepad for MusicGen, of all things?

MusicGen has no live control surface. You write a prompt, hit generate, wait, listen. The interaction model is fundamentally not real-time, which is fine when you are sketching but useless on a stage. The gamepad fixes the interaction problem without pretending to fix the latency one. Eight face buttons give you eight prompt slots. Two sticks give you four continuous knobs. Two triggers handle "fire" and "abort". Suddenly you are flying the model with your thumbs while it grinds, instead of typing prompts mid-set like a fool.

The architecture in one diagram

It looks heavier than it is. The gamepad fires MIDI into a small bridge script that calls MusicGen and writes the result to a watched folder. Ableton picks the WAV up via a hot-folder script and slots it on the next clip.

DualSense ──USB-C──► Universal Controller MIDI ──IAC bus──► Node script
                                                                │
                                                                ▼
                                                    Replicate (or local
                                                       Audiocraft GPU)
                                                                │
                                                                ▼
                                                  WAV → watched/incoming/
                                                                │
                                                                ▼
                                             Ableton Live → next clip slot

The prompt bank

The d-pad and face buttons map to eight prompts. Write them like patch names — short, punchy, unambiguous. MusicGen responds badly to flowery prose. "Dub techno 122 bpm sparse" works. "A swirling oceanic dream of broken hardware" does not.

ButtonMIDIPrompt
X (Cross)Note 60lofi hip hop 86 bpm warm rhodes vinyl crackle
CircleNote 61dub techno 122 bpm sparse low-end
SquareNote 62ambient drone shimmer no rhythm
TriangleNote 63uk garage 132 bpm syncopated
D-pad upNote 64orchestral strings cinematic build
D-pad rightNote 65hyperpop glitch 160 bpm
D-pad downNote 66jazz trio brushed drums upright bass
D-pad leftNote 67kosmische motorik 110 bpm

Wiring the gamepad to a generation script

The bridge sends standard MIDI; any language that speaks MIDI plus HTTP can drive it. Here is the loop in Node — short on purpose. It listens for note-on, looks up the prompt, reads the latest CC values, fires Replicate, writes the WAV.

// musicgen-pilot.mjs
import easymidi from 'easymidi';
import Replicate from 'replicate';
import fs from 'node:fs/promises';

const PROMPTS = [
  'lofi hip hop 86 bpm warm rhodes vinyl crackle',
  'dub techno 122 bpm sparse low-end',
  'ambient drone shimmer no rhythm',
  'uk garage 132 bpm syncopated',
  'orchestral strings cinematic build',
  'hyperpop glitch 160 bpm',
  'jazz trio brushed drums upright bass',
  'kosmische motorik 110 bpm',
];

const cc = { temperature: 64, topK: 64, duration: 64 };
const input = new easymidi.Input('Universal Controller MIDI');
const replicate = new Replicate();

input.on('cc', (m) => {
  if (m.controller === 1) cc.temperature = m.value;
  if (m.controller === 2) cc.topK = m.value;
  if (m.controller === 3) cc.duration = m.value;
});

input.on('noteon', async (m) => {
  const i = m.note - 60;
  if (i < 0 || i > 7) return;
  const out = await replicate.run('meta/musicgen', {
    input: {
      prompt: PROMPTS[i],
      duration: 4 + Math.round((cc.duration / 127) * 24),
      temperature: 0.5 + (cc.temperature / 127) * 1.0,
      top_k: 50 + Math.round((cc.topK / 127) * 200),
      model_version: 'stereo-melody-large',
    },
  });
  const buf = await (await fetch(out)).arrayBuffer();
  await fs.writeFile(`./watched/incoming/${Date.now()}.wav`, Buffer.from(buf));
});

What the sticks do

Sticks send CC 1, 2, and 3 — temperature, top-k, and duration respectively. Pull the left stick up to crank temperature (more chaotic outputs); push the right stick up for higher top-k (more diverse sampling). The right trigger sets duration from 4 to 28 seconds. Reading these as MIDI CC rather than wiring directly to the gamepad SDK means the same script works with any controller the bridge supports — DualSense, Xbox, Switch Pro, all the same.

Living with the latency

MusicGen will not generate in real time. There is no clever prompt that makes it faster. What you can do is overlap generations — fire prompt B while prompt A is rendering, then fire prompt C while B is rendering. By the time the audience hears prompt A drop in, you are already two prompts ahead. Triggers handle the fire-and-forget flow; the abort button (touchpad click in our default mapping) kills the in-flight generation if you want to bail.

Real performance use cases

  • Pre-roll while loading the next track. Fire a 12-second ambient pad while you cue up the next vinyl rip.
  • Interstitial swells between sections. Generate a short orchestral bed under a spoken-word interlude.
  • Sketching loops between songs at rehearsal. The gamepad-first workflow is much faster than typing prompts.
  • Failsafe ambient layer. Set the system to auto-fire a new ambient prompt every 90 seconds for a generative warm-up set.

The MusicGen limitations you should know

Meta's MusicGen release notes are upfront about this: the model is good at genre and texture, weak at structure and arrangement. It will not generate a four-minute song that has a verse, chorus, and bridge. It will generate 30 seconds of convincingly genre-consistent texture. Lean into that. Use it as a texture generator with a gamepad steering wheel, not as a songwriter.

The verdict

Gamepad piloting does not make MusicGen real-time — nothing does. But it turns "type a prompt and wait" into "fly a queue of prompts with your thumbs while the model grinds in the background", and that is the difference between a tech demo and a usable tool. Pair it with the ambient soundscapes rig and you have a complete generative warm-up set. Universal Controller MIDI handles the MIDI bit; the script above is the bridge to the model.

Keep reading

More setup walkthroughs