MeloTTS: real-time multilingual TTS on CPU at 50MB
In one sentence MeloTTS is the first production-quality multilingual TTS to run in real-time on CPU, weighing just 50MB and supporting English, Chinese, Japanese, Korean, Spanish and French.
Most high-quality TTS systems need a graphics card to work in real time. This makes them expensive to deploy in the cloud and impossible on inexpensive devices, Raspberry Pi, or servers without GPUs.
MyShell AI's MeloTTS solves this problem with an unusual approach: instead of aiming for maximum absolute quality, it optimizes for perceived quality relative to available resources. The result is a model of just 50 megabytes that runs on CPU at 15 times real speech speed.
"15x real-time on CPU" means that to produce one second of audio the model takes less than 67 milliseconds. This enables instant speech synthesis on any hardware, from an old laptop to an inexpensive ARM server.
It supports six languages: English (with American, English, Indian, Australian and default accents), Mandarin Chinese, Japanese, Korean, Spanish and French. For such a small model, the coverage is surprising.
It is particularly useful for IoT applications, local assistants, embedded systems, or any scenario where you don't want to depend on cloud APIs for speech synthesis.
Companies
MyShell AI
Tools
—
Tags
Sources