Amazon Alexa’s Natural Voice: The Science Behind It
Alexa now has a more natural voice that will make the experience of interacting with her more enjoyable. Alexa’s improved, natural responses are due to a new Neural TTS (text-to-speech) technology.
How Alexa’s Speech Output Works
Alexa’s speech output comes from text-to-speech technology that take sequences of words and turns them into the most natural sounding and intelligible audio responses. Alexa’s continued machine learning (ML) algorithms determine how to pick speech sounds and put together the most natural response.
How Alexa’s Voice Changed
Now, Alexa’s new iterations will come back to users with an even more natural response and natural sounding voice. The context of a user’s request, allows Alexa to adapt to her natural speaking style, too. For example, the NTTS technology will allow Alexa to deliver the definition of a word or historical information that you want to know in a different speaking style than providing information on the day’s news or weather.
The Science Behind her Natural Voice
Scientists at Amazon took an entirely new approach to speech synthesis when crafting Alexa’s more natural and high-quality voice. The speech synthesis, called direct waveform modeling, applies deep learning to produce the speech signal. In other words, compared to the previous TTS technology, the NTTS-generated speech has better intonation, emphasizes the right words in a sentence, and improved segmental quality.