OpenAI Jukebox: generating whole songs with vocals

In one sentence OpenAI releases Jukebox, a generative model that produces raw songs (audio + vocals + lyrics) conditioned on artist and genre, built on a stack of VQ-VAE and autoregressive transformers.

Verified Official source

ShareLinkedIn X

Until then, music-generation models mostly produced MIDI sequences or short instrumental clips. OpenAI tries something more ambitious: generate raw audio, complete with sung vocals and words.

It's called Jukebox. You give it an artist name (Frank Sinatra, Elvis, Kanye West), a genre, and a few lines of lyrics, and it returns a two-minute song. It's not perfect — the voice is "wet", words sometimes wrong — but it's something that didn't exist before.

For musicians, it's the first taste of a future where generative tools enter the studio. For OpenAI, a demonstration that transformers work well outside text too.