Gemini 3.5 Live Translate: Continuous Modeling for Low-Latency Speech-to-Speech Conversion

Twenty years ago, translation at Google began as a machine learning experiment. We wanted to turn the science of language into human connection. That project has grown significantly. We now translate over a trillion words for billions of users every month. Today, we are taking our next step. We are releasing Gemini 3.5 Live Translate, our latest audio model built for live speech-to-speech translation.

Gemini 3.5 Live Translate provides continuous language processing across our products. It operates very differently from older turn-by-turn systems. Those older models required a speaker to completely finish a sentence before the translation process could even begin. Instead of waiting, our new model processes audio as a continuous stream of data. It generates translated speech while the original person is still talking. By staying just a few seconds behind the speaker, the method balances the need to understand context with the need to translate immediately. This removes awkward pauses and significantly reduces conversation delays.

The software automatically detects and translates more than 70 languages without requiring users to manually change settings. We also built the system to preserve the original speaker's tone, pacing, and pitch. The translated voice sounds natural and reflects how the person actually speaks. To ensure the model works effectively in the real world, we included noise filtering technology. The system can process inputs accurately in loud and unpredictable environments.

We prioritize safety and responsibility with our generated audio. All audio created by our models includes a digital watermark called SynthID. This imperceptible audio marker is woven directly into the sound output. It ensures content generated by artificial intelligence remains detectable by computers. That helps prevent the spread of false information.

Global App Integration and Listening Mode

We are rolling out this technology globally on the Google Translate app starting June 9, 2026. This update is available for Android and iOS devices. In the past, hardware requirements limited live audio translation to specific devices like Pixel Buds. We have completely removed that limitation. The feature now works seamlessly with any standard pair of headphones equipped with a built-in microphone.

For Android users, we are introducing a new feature called listening mode. This allows you to hear translations privately directly through your phone's earpiece. You simply hold your phone to your ear like a regular phone call, and the translated audio streams straight to you. We designed this specific experience for situations where you want quick translations without broadcasting the audio to everyone else. It works perfectly when you do not have headphones available. Using this mode, you can hear a near real-time English translation of a Spanish guided tour straight through the device earpiece.

For business customers, the model powers a new tool called AI voice interpretation in Google Meet. It helps participants speaking different languages like English, Mandarin, and Swedish understand each other during video calls. We are launching this update in a private preview for select Google Workspace business customers this month. A broader release will follow later this year. This feature requires paid accounts with Gemini add-ons and depends on the meeting host's subscription plan.

Developer API and External Partner Testing

Developers can build their own voice translation applications using our technology through the Gemini Live API. This system handles the complex real-time media streaming infrastructure. Software creators can focus strictly on the user experience. Platforms including Agora, Fishjam, LiveKit, Pipecat, and Vision Agents have already integrated the API to help other companies deploy voice translation apps easily.

Our partners at Grab are testing the model to enable multilingual communication between drivers and travelers at pickup locations. These users make over 10 million voice calls per month through the Grab platform. Fast and accurate translation is absolutely vital for their daily operations.

Other companies have shared direct feedback on their testing experiences. Philipp Kandal noted his team valued the ability to auto-detect multiple languages and translate speech accurately with minimal delay. Bella Baek from CJ ENM stated her company is excited to partner with us. Early tests show promising quality for a more authentic experience for global and Korean viewers. Jesse Hall built a demonstration on LiveKit Agents where everyone speaks their own language and understands each other live. He showed that the system makes multilingual voice interactions effortless.

Nash Ramdial reported his team tested the 3.5 Live Translate model across several languages. They were highly impressed by the speed, accuracy, and liveliness of the system. Maciej Rys stated the model, paired with Fishjam's streaming protocol, sets a new standard for real-time multimedia streaming. Mason Adams confirmed that they tested the model at Agora. In their opinion, it provided highly accurate results with a low delay that sets a new performance level for real-time translation.

Gemini 3.5 Live Translate: Continuous Modeling for Low-Latency Speech-to-Speech Conversion

Global App Integration and Listening Mode

Developer API and External Partner Testing

Sources:

Related articles

Securing Autonomous AI: HPE and Nvidia Launch Compliant Infrastructure for Enterprise Agents

Enterprise AI Architecture: Deploying Partner Networks and Agentic RAG Pipelines

Regulatory Intervention and Export Controls: Analyzing the Global Suspension of Anthropic’s Fable 5 and Mythos 5