Parler TTS: HuggingFace releases the first text-controllable open source TTS

In one sentence Parler TTS generates voices described in natural language — 'slow, low male voice with echo' — trained on 45k hours, Apache 2.0, first fully controllable open source TTS.

Verified Official source

ShareLinkedIn X

Instead of choosing a voice from a menu or uploading a reference audio clip, with Parler TTS you describe the voice you want in natural language: "man with a deep, calm voice, slow speech, slight echo" — and the system generates exactly that voice. It is the first open source TTS system that allows this level of textual control: all previous systems required a reference audio or a preset voice. HuggingFace trained it on 45,000 hours of audio annotated with textual descriptions, allowing the model to learn the link between language and vocal characteristics. It is released under Apache 2.0: anyone can use, modify, and distribute it freely.