Magenta gamepad melody generation is the dumbest fun you can have with a recurrent neural network and a $70 controller. Google's Magenta project has been training small MIDI-friendly models — MelodyRNN, ImprovRNN, MusicVAE — for the better part of a decade, and the pre-trained checkpoints are all still hosted and still work. The trick is to make the seed phrase something you play on a gamepad, hand the RNN the four bars you just performed, and let it write the next four. A trigger pull commits the AI's reply into your timeline. A stick sweep controls how wild the model gets. The whole loop fits in a browser tab.
- What you build: a gamepad rig that records a 4-bar seed, sends it to Magenta MelodyRNN, plays back the AI's 4-bar continuation, and rerolls on a trigger.
- What you need: Universal Controller MIDI, Magenta.js in the browser (or Python Magenta locally), a synth on the receiving end.
- Generation time: 200–600 ms per call on a recent Mac. Snappy enough.
- Why it works: AI is great at "and then what?", lousy at intent. You bring the intent in the seed, the model brings the variation.
Why MelodyRNN is still the right model for this
Generative-music research has moved on to diffusion and big transformer models, but most of those are too slow to sit between a trigger pull and "hear it now." MelodyRNN was designed for single-monophonic-line continuation, runs in browser-side JS in under a second, and emits MIDI directly. There is no audio rendering step. You play notes, the model returns more notes, your synth plays them. The original Magenta blog covers the design intent well — small, fast, MIDI-native.
For a gamepad rig that's exactly the right shape. The model's latency budget is your hand's latency budget. Anything bigger and the trigger pull stops feeling like an instrument.
How the data flows
Five blocks, each doing one job:
┌─────────────┐ MIDI ┌──────────────┐
│ Gamepad │ ─────────→ │ 4-bar seed │
│ → bridge │ notes │ buffer │
└─────────────┘ └──────┬───────┘
│ R2 press
▼
┌──────────────┐
│ MelodyRNN │
│ temp = R-Y │
└──────┬───────┘
│ 4 bars MIDI
▼
┌──────────────┐
│ Synth / DAW │
│ plays reply │
└──────────────┘
The seed buffer is just a ring of NoteOn events with timestamps. The temperature value is a 0–1 CC mapped to a 0.4–1.5 RNN parameter on the way through. Everything else is plumbing.
The Magenta.js side — under 60 lines
Magenta.js can be dropped into a vanilla HTML page. The snippet below loads MelodyRNN, accepts a seed sequence, generates a continuation at a given temperature, and emits the result as MIDI through WebMIDI. Wire the bridge's WebMIDI sink to it and you've closed the loop.
<script type="module">
import * as mm from 'https://cdn.jsdelivr.net/npm/@magentamusic/music@^1';
const rnn = new mm.MusicRNN(
'https://storage.googleapis.com/magentadata/js/checkpoints/music_rnn/basic_rnn'
);
await rnn.initialize();
const midi = await navigator.requestMIDIAccess();
const out = [...midi.outputs.values()].find(o => o.name.includes('Magenta'));
// Seed buffer — refilled from the gamepad bridge over WebMIDI
let seed = {
notes: [], // { pitch, startTime, endTime }
totalQuantizedSteps: 64, // 4 bars at 16th notes
quantizationInfo: { stepsPerQuarter: 4 }
};
async function generateContinuation(temperature) {
const continuation = await rnn.continueSequence(
seed, // your gamepad-played seed
64, // steps to generate (4 bars at 16ths)
temperature // 0.4 safe, 1.5 chaotic
);
scheduleToMidiOut(continuation, out);
}
// Trigger: R2 of the gamepad fires this
function onRtTrigger(rightStickY) {
const temp = 0.4 + rightStickY * 1.1; // map 0–1 → 0.4–1.5
generateContinuation(temp);
}
</script> The bridge handles seed capture and trigger detection. The browser handles inference and playback scheduling. You sit in the middle with a controller in your lap and a synth on the desk. The Universal Controller MIDI bridge ships with an opt-in "Magenta Sink" feature in the Pro tier that wires this up without HTML.
Gamepad layout — playing vs steering
The face buttons and d-pad become the keyboard for the seed phrase. The sticks and triggers steer the model. Keep the two jobs separate or it gets confusing fast.
| Gamepad input | Job | What it does |
|---|---|---|
| Cross / Circle / Square / Triangle | Play seed | Four notes of the current scale (root, 3rd, 5th, 7th) |
| D-pad ↑↓ | Play seed | Octave up / down |
| D-pad ←→ | Steer | Switch scale (minor, dorian, phrygian, etc.) |
| Left stick X | Steer | Octave bias for the RNN sampling |
| Right stick Y | Steer | Temperature (0.4 → 1.5) |
| R2 trigger | Steer | Generate continuation |
| L2 trigger | Steer | Commit current continuation to score |
| Touchpad click | Steer | Clear seed buffer + start over |
The performance loop
A typical session goes:
- Tap the touchpad to clear. The seed buffer empties.
- Play four bars on the face buttons. They get recorded as the seed.
- Set temperature with the right stick — start around the middle for sensible variation.
- Pull R2. The model thinks for ~400 ms. The continuation plays.
- Don't like it? Sweep the stick, pull R2 again. New continuation.
- Found one you like? Hold L2 to commit it — the bridge writes the continuation back into the seed buffer so the next R2 press continues from there. Now you're improvising with the RNN turn by turn.
Where the limits are
MelodyRNN is monophonic — one note at a time. It also has no concept of phrasing beyond the four-bar window. The continuations will drift if you keep rolling them turn after turn for too long. The sweet spot is two or three turns then a manual seed reset.
Magenta also ships ImprovRNN (chord-aware) and MusicVAE (latent-space interpolation) which are both worth wiring into the same gamepad rig once you've got the basic MelodyRNN flow working. MusicVAE in particular lets you put the left stick X/Y on a 2D latent slice and morph between two seed phrases — a different beast, equally good.
The honest take
AI melody completion isn't going to write your album. It's going to surprise you about every fifth pull of the trigger, and that fifth surprise is sometimes the lick you've been chasing for an hour. The gamepad makes that fifth-pull rate manageable because the trigger is faster than a mouse click and the temperature steering is continuous. Pair it with a Csound orchestra on the receiving end for sound design, or a film-scoring articulation rig if you want the AI to write counterlines underneath a written cue. Either way, grab Universal Controller MIDI, fire up the Magenta tab, and start re-rolling.