Music generation transformer

Personal

Generative Models

PyTorch

Decoder-only generative transformer in PyTorch supporting both text and MIDI input, trained on Bach’s Goldberg Variations and The Well-Tempered Clavier.

Published

August 1, 2024

Personal project · 2024

Idea

A from-scratch implementation of a decoder-only generative transformer in PyTorch — small enough to be readable, big enough to actually learn something interesting. The model is dual-modality: with a text tokenizer it behaves as a small language model; with a MIDI tokenizer it generates music.

Implementation

Standard decoder-only transformer with multi-head self-attention, positional encoding, and a final language modeling head — roughly the structure popularized by GPT, but at a scale that fits on a single consumer GPU. Tokenization swaps between BPE for text and a custom MIDI event tokenizer.

Trained on Bach’s Goldberg Variations and The Well-Tempered Clavier for the music generation experiments.

Demo

Three generated samples below — the first two trained on the Goldberg Variations, the third on The Well-Tempered Clavier.

<audio controls preload="none" style="width:100%;">
  <source src="../assets/audio/gen_gold_overfit_2.mp3" type="audio/mpeg">
</audio>
<div style="font-family:'JetBrains Mono', monospace; font-size:0.72rem; letter-spacing:0.12em; text-transform:uppercase; color:#8b949e; margin-top:0.4rem;">Goldberg Variations · 1</div>

<audio controls preload="none" style="width:100%;">
  <source src="../assets/audio/gen_gold_overfit_3.mp3" type="audio/mpeg">
</audio>
<div style="font-family:'JetBrains Mono', monospace; font-size:0.72rem; letter-spacing:0.12em; text-transform:uppercase; color:#8b949e; margin-top:0.4rem;">Goldberg Variations · 2</div>

<audio controls preload="none" style="width:100%;">
  <source src="../assets/audio/gen_clavier_chicken.mp3" type="audio/mpeg">
</audio>
<div style="font-family:'JetBrains Mono', monospace; font-size:0.72rem; letter-spacing:0.12em; text-transform:uppercase; color:#8b949e; margin-top:0.4rem;">Well-Tempered Clavier</div>

Stack

PyTorch, custom MIDI tokenizer, MIDI synthesis tooling for playback.

GitHub repository →