EnCodec: Meta AI compresses audio with neural networks and beats Opus

In one sentence EnCodec compresses 24kHz stereo audio to just 1.5–12 kbps at quality surpassing Opus, becoming the standard vocoder for modern neural TTS.

Verified Official source

ShareLinkedIn X

EnCodec is an audio codec that uses neural networks instead of classical algorithms: it takes a sound, compresses it into very small codes, and reconstructs it with high fidelity. It works at very low bitrates — as low as 1.5 kbps for mono audio, less than an SMS — while maintaining perceptual quality above traditional codecs like Opus or EVS. The most important part for AI is the RVQ structure (Residual Vector Quantization): audio is represented as sequences of discrete tokens, perfect for use by language models. That is why EnCodec became the de facto vocoder for systems like AudioLM, SoundStorm, VALL-E and Voicebox.