Cartesia Sonic: 50ms TTS for voice agents in production
In one sentence Cartesia launches Sonic, a TTS with ultra-low 50ms latency, token-by-token streaming, voice cloning without fine-tuning, designed specifically for AI voice agents in production environments.
Building an AI voice agent that responds as fluidly as a human requires an artificial voice that starts speaking almost instantly — not after one or two seconds of waiting. Cartesia Sonic is designed exactly for this use case.
With a latency of just 50 milliseconds from the moment the LLM starts generating text, Sonic produces the first audio tokens almost in real time. Voice cloning works without requiring fine-tuning: just a few seconds of reference audio is enough to clone a voice and use it immediately.
The product is designed for developers building voice agents in production — AI call centers, voice assistants, automated response systems — where latency makes the difference between a fluid and a frustrating interaction.
Companies
Cartesia
Tools
Sonic, Cartesia TTS
Tags
Sources