Stable Audio Open: first open-weight model for music generation

In one sentence Stable Audio Open is the first open-weight model for generating music and sound effects from text prompts, with a CC-BY license enabling commercial use, based on latent diffusion with timing conditioning.

Needs review Official source

ShareLinkedIn X

Until 2024, generating quality music from a text description was only possible with commercial services like Suno or Udio. All the really good models were proprietary and required subscriptions or paid APIs.

Stability AI's Stable Audio Open changes this: it is the first quality music generation model that anyone can download, run locally, modify and use commercially. The CC-BY license only requires citing the source.

With a description like "aggressive drum and bass with deep bass, 140 BPM" or "relaxing ambient with piano and rain," the model generates stereo audio clips of approximately 47 seconds that respect both the musical content and the requested timing.

The timing conditioning aspect is particularly useful: you can specify both the duration and the starting point in the musical structure, which allows you to generate intros, bridges, or outros coherently.

It is also the first open model of this type to support the generation of sound effects and non-musical audio textures.