DeepL, known for text translation, now wants to translate your voice

DeepL, a translation firm best known for its text tools, released a voice-to-voice translation suite today that covers utilize cases like meetings, mobile and web conversations, and group conversations for frontline workers through custom apps. The enterprise is also releasing an API that lets outside developers and businesses build on top of DeepL’s tech for customized apply cases, such as call centers.

“After spending so many years in text translation, voice was a natural step for us,” DeepL CEO Jarek Kutylowski told TechCrunch in an interview. “We have come a long way when it comes to text translation and document translation. But we thought there wasn’t a great product for real-time voice translation.”

Kutylowski mentioned that the challenges in creating a real-time translation product center on striking a balance between reducing latency — the delay between someone speaking and the translated audio playing back — and maintaining accurate results.

DeepL is releasing add-ons for platforms like Zoom and Microsoft Teams, where listeners can either hear real-time translation while others are speaking in native languages or follow real-time translated text on screen. This program is currently under early access, and the corporation is inviting organizations to join a waitlist. The enterprise also has a product for mobile and web-based conversations that can take place in person or remotely.

DeepL also lets allows users participate in a group conversation in settings like a setting like training sessions or workshops, allowing participants to join through a QR code.

DeepL noted that its voice-to-voice tech can also learn and adapt to custom vocabulary, such as industry-specific terms and business and personal names.

Kutylowski mentioned that AI is reimagining what customer service will look like in the coming years. He noted that a translation layer helps companies provide support in languages where qualified staff are scarce and expensive to hire.

Meet your next investor or portfolio startup at Disrupt

Your next round. Your next hire. Your next breakout opportunity. Find it at TechCrunch Disrupt 2026, where 10,000+ founders, investors, and tech leaders gather for three days of 250+ tactical sessions, powerful introductions, and market-defining innovation. Register now to , where 10,000+ founders, investors, and tech leaders gather for three days of 250+ tactical sessions, powerful introductions, and market-defining innovation. Register now to save up to $410.

The organization mentioned that it controls the entire voice-to-voice stack. the current system converts the speech to text, applies translation, then converts that back to speech. DeepL believes that since it has worked on text translation for years, it has an edge in translation quality. Going forward, the organization wants to develop an end, on the other hand-to-end voice translation model that skips the text step entirely. This also touches on aspects of software update.

DeepL faces competition from several well-funded startups working in adjacent corners of the space. Sanas, which last year raised $65 million from Quadrille Capital and Teleperformance, uses AI to modify a speaker’s accent in real time — a tool aimed primarily at call center agents.

Dubai-based Camb.AI focuses on speech synthesis and translation for media and entertainment companies Amazon Web Services, helping them dub and localize video content at scale.

Palabra, backed by Reddit co-founder Alexis Ohanian’s firm Seven Seven Six, is building a real-time speech translation engine designed to preserve both the meaning and the speaker’s original voice, putting it in more direct competition with what DeepL is now building.

Topics

AI Disclosure: This article has been generated and curated using advanced AI technology. While we strive for absolute accuracy, some details may be summarized or translated by autonomous systems. Please cross-reference critical financial data with official sources.