Mistral Voxtral Transcribe 2: open-source speech-to-text that runs on a laptop

In one sentence Mistral releases Voxtral Transcribe 2: two open-source STT models (Batch + Realtime, 4B params) with latency configurable down to 200ms, Apache 2.0, 13 languages.

Needs review Official source

ShareLinkedIn X

Until recently, turning voice into text (ASR, "speech-to-text") was something you could really do well only via closed cloud services: OpenAI's Whisper, ElevenLabs' Scribe, AssemblyAI. They cost money and you have to ship audio to their servers.

Mistral releases Voxtral Transcribe 2: two open models (one for batch, one for real-time), 4 billion parameters, small enough to run on a MacBook or a modern smartphone. They're Apache 2.0, so you download and use them with no license fees.

Claimed numbers: batch transcription at $0.003 per minute (80% cheaper than ElevenLabs Scribe v2), real-time version with latency configurable down to 200 milliseconds. 4% Word Error Rate on FLEURS, above GPT-4o Mini Transcribe.

For anyone building voice apps, call centers, podcast tooling: suddenly you can host ASR in-house without shipping audio to a third party.