HomeNewsArtificial IntelligenceMistral AI Releases Voxtral Transcribe 2, Targets Speed and Cost Improvements

Mistral AI Releases Voxtral Transcribe 2, Targets Speed and Cost Improvements

Mistral AI launched Voxtral Transcribe 2, a 2nd gen speech-to-text models aimed at faster delivey, more accurate, and lower-cost transcription.

Tea4Tech Team

Updated on Apr 27, 2026 11:55 AM

Make us preferred source on Google

Mistral AI Releases Voxtral Transcribe 2, Targets Speed and Cost Improvements - feature image

Paris: Mistral AI, high-performance AI company introduced Voxtral Transcribe 2, the company’s second-generation speech-to-text models aimed at delivering faster, more accurate, and lower-cost transcription. The release includes two variants, Voxtral Mini Transcribe V2 for batch processing and Voxtral Realtime for live audio applications with latency configurable down to sub-200 milliseconds.

mistral ai error rate — credits: mistral ai

In a post on X, the company described it as “next-gen speech-to-text” offering state-of-the-art transcription, speaker diarization, and sub-200ms real-time latency.

Introducing Voxtral Transcribe 2, next-gen speech-to-text models by @MistralAI.
State-of-the-art transcription, speaker diarization, sub-200ms real-time latency.
Details in 🧵 pic.twitter.com/0IeiJOpiAZ
— Mistral AI (@MistralAI) February 4, 2026

The Paris-based AI firm said the new models are designed to compete directly with leading transcription services while significantly reducing costs. Voxtral Mini Transcribe V2 is priced at $0.003 per minute for batch jobs, which the company says is roughly one-fifth the cost of competing offerings such as ElevenLabs’ Scribe v2.

According to Mistral’s internal benchmarks, the models deliver about a 4%-word error rate on the FLEURS dataset, outperforming several well-known transcription systems while also processing audio up to three times faster than some rivals. The company added that the real-time model can match batch-level accuracy at higher latency settings suitable for live subtitling, while lower latency modes introduce only a small increase in error rates.

Voxtral Mini Transcribe V2 includes features such as speaker diarization, word-level timestamps, and context biasing that allow users to add up to 100 domain-specific terms for improved accuracy. Voxtral Realtime, meanwhile, is built for voice agents, live captioning, and call-center automation.

Notably, Voxtral Realtime is released under the Apache 2.0 license, allowing organizations to deploy it on-premises without relying on external APIs. With a 4-billion-parameter footprint capable of running on edge devices, the models are positioned for industries with strict data-privacy requirements, including healthcare and finance.

Published on February 5, 2026

Anthropic Promises Claude Will Remain Ad-Free, Launches Super Bowl Campaign

Up Next

ElevenLabs Raises $500 Million at $11 Billion Valuation

Tea4Tech Team

Journalist

The Tea4Tech News Desk is our collaborative editorial team dedicated to bringing you the latest breaking news, industry updates, and trending stories from the world of technology. Comprised of experienced journalists, tech enthusiasts, and digital researchers, the News Desk works around the clock to curate, verify, and deliver timely content that k...

View Bio