Stable Audio 2.0: stereo music up to 3 minutes with structure control

In one sentence Stability AI launches Stable Audio 2.0 with stereo audio generation up to 3 minutes, explicit control over intro/outro/instruments, and 44kHz quality, surpassing previous version limits.

Verified Official source

ShareLinkedIn X

Generating music with AI was already possible, but clips were often short, mono, and without coherent structure. Stable Audio 2.0 raises the bar on all these fronts simultaneously.

The model produces stereo audio at CD quality (44kHz) for durations up to three full minutes. But the real novelty is control over musical structure: you can specify in the prompt whether you want an instrumental intro, a fade-out, which instruments should be present and when.

Stability AI chooses to make the model accessible via web and API, following the same democratization philosophy that brought Stable Diffusion success in the image field.