Dia 1.6B: open-source dialogic TTS with laughter, breathing and human naturalness

In one sentence Dia by Nari Labs is the first open-source TTS to generate natural dialogues with non-verbal cues like laughter, breathing pauses and emotional emphasis, matching ElevenLabs dialogue quality for multi-speaker dialogues under Apache 2.0.

Needs review Reputable source

ShareLinkedIn X

High-quality synthetic voices have improved enormously in recent years, but something was always missing: human spontaneity. When two people talk, they don't just emit words — they laugh, breathe, pause, suddenly change tone, interrupt each other.

Traditional TTS systems couldn't do these things. They produced text read cleanly, but without that layer of naturalness that makes a real conversation.

Nari Labs' Dia is the first open-source model to bridge this gap: it generates two-voice dialogues from a transcript where you can indicate non-verbal actions with tags like [laughs], [sighs], [clears throat] or [breathes]. The model automatically produces the corresponding sounds integrated into the speech flow.

With 1.6 billion parameters and Apache 2.0 license, it rivals the dialogue quality of ElevenLabs — a commercial service costing hundreds of euros per month for intensive use — and runs completely locally.

For synthetic podcasts, video game narrators, educational content or virtual assistants, this is a qualitative leap that makes the boundary with human recordings much thinner.