Music generation transformer
Personal project · 2024
Idea
A from-scratch implementation of a decoder-only generative transformer in PyTorch — small enough to be readable, big enough to actually learn something interesting. The model is dual-modality: with a text tokenizer it behaves as a small language model; with a MIDI tokenizer it generates music.
Implementation
Standard decoder-only transformer with multi-head self-attention, positional encoding, and a final language modeling head — roughly the structure popularized by GPT, but at a scale that fits on a single consumer GPU. Tokenization swaps between BPE for text and a custom MIDI event tokenizer.
Trained on Bach’s Goldberg Variations and The Well-Tempered Clavier for the music generation experiments.
Demo
Three generated samples below — the first two trained on the Goldberg Variations, the third on The Well-Tempered Clavier.
<audio controls preload="none" style="width:100%;">
<source src="../assets/audio/gen_gold_overfit_2.mp3" type="audio/mpeg">
</audio>
<div style="font-family:'JetBrains Mono', monospace; font-size:0.72rem; letter-spacing:0.12em; text-transform:uppercase; color:#8b949e; margin-top:0.4rem;">Goldberg Variations · 1</div>
<audio controls preload="none" style="width:100%;">
<source src="../assets/audio/gen_gold_overfit_3.mp3" type="audio/mpeg">
</audio>
<div style="font-family:'JetBrains Mono', monospace; font-size:0.72rem; letter-spacing:0.12em; text-transform:uppercase; color:#8b949e; margin-top:0.4rem;">Goldberg Variations · 2</div>
<audio controls preload="none" style="width:100%;">
<source src="../assets/audio/gen_clavier_chicken.mp3" type="audio/mpeg">
</audio>
<div style="font-family:'JetBrains Mono', monospace; font-size:0.72rem; letter-spacing:0.12em; text-transform:uppercase; color:#8b949e; margin-top:0.4rem;">Well-Tempered Clavier</div>
Stack
PyTorch, custom MIDI tokenizer, MIDI synthesis tooling for playback.