Disembodied voices tend to be alarming, but the sound of these agents has become familiar and usually welcome, providing answers or weather updates, buying tickets, or setting a timer for the bread in the oven.
Essentially the assistants are software agents connected to massive data in the clouds that get better at natural language processing every day due to AI and machine learning. They’re code, data, and powerful processing with human names in your choice of gender and speaking style.
The beginning of phase one of the voice assistant era goes back to the embedding of Siri in iPhone 4s in 2011. The growth spurts along the way have been spectacular. According to NPR, in one year, December 2017 to December 2018, the number of smart speakers in U.S. households grew by 78%, from 66.7 million to 118.5 million. Today, the voice assistant installed base by provider is: Apple Siri devices, 500+ million; Google Assistant devices, 500+ million; Microsoft Cortana devices, 400+ million; and Amazon Alexa devices, 100+ million (totals from voicebot.ai). We’ve now entered phase two of the voice assistant era as the digital assistant producers are now pedaling their products alongside the computing industries that are pursuing ambient computing—digital everywhere.
The most common homes for assistants have been either the smart speaker or the smartphone. That accounts for Apple’s head start over a company like Amazon in installed base numbers. In 2017, the installed base numbers were: Siri, 48.4%; Google Assistant, 28.7%; Alexa, 10.1%; and Cortana, 3.5% (from the 2018 Voicebot Smart Speaker Consumer Adoption Report analysis). Google also can leverage the Android operating system on a wide variety of smartphones. In an effort to catch up, Amazon is investing heavily in research and is taking the ambient route forward.
In September 2019, Amazon launched a number of Alexa-enabled products at a single press event: the Echo Dot Clock (voice-controlled smart alarm clock); a new, cloth-covered Echo smart speaker; the Echo Show 8 with a new screen size; the Echo Flex, a very small plug-in speaker with a USB charging port that will put Alexa in every room and function as a nightlight as well; the Echo Loop, a ring for your finger with two microphones and haptic vibration for Alexa notifications and incoming calls; Echo Studio, a high-end (now the biggest) smart speaker with Alexa; Echo Buds, wireless Alexa hands-free earbuds; Echo Frames, smart prescription eyeglass frames with a microphone to contact Alexa; and an Alexa-enabled microwave oven for the kitchen. And that’s what Alexa-branded ambient computing will look like this year.
Along with the current ambient strategy for voice assistants, there’s an effort to convert the devices into sentient gadgets, not just passive responders. This year, in order to learn, Alexa is dwelling on context clues when it fails to offer a correct or acceptable response. Lauren Good wrote in Wired that Alexa can now whisper responses when it’s questioned in a whisper—thus letting context shape its response. Alexa is also now able to hear and respond to multiple languages in multilingual homes without going into settings each time for language changes. Google has extended its Assistant’s memory so you can ask follow-up questions to which it will reply without your having to call “Hey Google” to get its attention. It does this by listening for an additional eight seconds for follow-up.
EVEN SMARTER TOMORROW
There’s a new Turing test called the Alexa Prize Socialbot Grand Challenge. The third annual competition began in September, and winners will be announced in June 2020. The rules are simple. “Competing teams will create socialbots that can converse coherently and engagingly for 20 minutes with humans on a range of current events and popular topics such as entertainment, sports, politics, technology, and fashion while earning a rating of 4.0 out of 5.0.” The 2018 Alexa Prize winner was the team from the University of California, Davis. The team achieved an average score of 3.1 and average duration of 9 minutes and 59 seconds.
It could be that we’re just about halfway there to assistants who sound even more like us.