Skip to content
AImpact
IT EN
Medium Voice & Audio · 1 min read

MeloTTS: real-time multilingual TTS on CPU at 50MB

In one sentence MeloTTS is the first production-quality multilingual TTS to run in real-time on CPU, weighing just 50MB and supporting English, Chinese, Japanese, Korean, Spanish and French.

Needs review Reputable source
ShareLinkedInX
Reading level

Most high-quality TTS systems need a graphics card to work in real time. This makes them expensive to deploy in the cloud and impossible on inexpensive devices, Raspberry Pi, or servers without GPUs.

MyShell AI's MeloTTS solves this problem with an unusual approach: instead of aiming for maximum absolute quality, it optimizes for perceived quality relative to available resources. The result is a model of just 50 megabytes that runs on CPU at 15 times real speech speed.

"15x real-time on CPU" means that to produce one second of audio the model takes less than 67 milliseconds. This enables instant speech synthesis on any hardware, from an old laptop to an inexpensive ARM server.

It supports six languages: English (with American, English, Indian, Australian and default accents), Mandarin Chinese, Japanese, Korean, Spanish and French. For such a small model, the coverage is surprising.

It is particularly useful for IoT applications, local assistants, embedded systems, or any scenario where you don't want to depend on cloud APIs for speech synthesis.

Companies

MyShell AI

Tools

Tags

MeloTTSmultilingualreal-timeCPU inferenceedgeMyShell AIcompact model

Sources