Google has updated its Voice Search models to be powered by Speech-to-Retrieval (S2R). Google said this allows it to "gets answers straight from your spoken query without having to convert it to text first, resulting in a faster, more reliable search for everyone."
Google initially used a voice search solution named automatic speech recognition (ASR) to turn the voice input into a text query, and then searched for documents matching that text query. Google said this was "a challenge with this cascade modeling approach is that any slight errors in the speech recognition phase can significantly alter the meaning of the query, producing the wrong results."
Speech-to-Retrieval (S2R) solved this issue. Google said, "At its core, S2R is a technology that directly interprets and retrieves information from a spoken query without the intermediate, and potentially flawed, step of having to create a perfect text transcript. It represents a fundamental architectural and philosophical shift in how machines process human speech."
This was posted on the Google Research blog but it is being used now, in the real-world. Google wrote, "The move to S2R-powered voice search isn’t a theoretical exercise; it’s a live reality. In a close collaboration between Google Research and Search, these advanced models are now serving users in multiple languages, delivering a significant leap in accuracy beyond conventional cascade systems."
Hat tip to Gagan:
🆕 Huge update for Voice Search -> now its powered by Speech-to-Retrieval engine and this new process don't convert speech to a text transcript & then do a web search rather this new technique uses an audio encoder for converting sound into audio embeddings which then is used to… https://t.co/iv2q4Kp0Qt pic.twitter.com/bCGwIfKNEh
— Gagan Ghotra (@gaganghotra_) October 8, 2025
Forum discussion at X.