Back in March, shortly after the coronavirus had been deemed a national crisis, I walked into a 7-Eleven to pick up a late-night snack. Taped to the door was a makeshift sign warning not to touch the Slurpee machine to limit the risk of viral spread. I glanced at the coolers full of soft drinks, the touchscreen payment reader beside the cash register, the plastic surface of the smartphone case I’d touched perhaps 2,600 times that day.
Maybe the Slurpee machine was riskier by some small degree than these other surfaces, but by how much? The whole strange episode called me to wonder if people would recall this time as the death knell of public touchscreens, or at least the dawn of a new era in conversant interfaces, an age in which our voices, not our hands, would help us navigate the world.
Just a few weeks later, I discovered some low-level confirmation for the theory. A Colorado-based company, Valyant AI, was pilot-testing a new way of ordering food at quick-serve restaurants, using a voice-controlled kiosk supported by artificial intelligence.
Two thirds of U.S. customers are using self-checkout and 87 percent would prefer to shop in stores with “touchless or robust self-checkout options.”
It should be said there is limited, if any, research to indicate how common transmission is by surface contact (experts believe close person-to-person contact is the main culprit), but several studies suggest it is possible, including an analysis in the New England Journal of Medicine that found that SARS-CoV-2 could remain viable on copper for up to four hours, on cardboard up to 24 hours and on plastic and stainless steel up to 72 hours.
Even if we don’t fully understand how it spreads, it is clear the risk of contact transmission is influencing consumer behavior. A survey from Shekel found two thirds of U.S. customers are using self-checkout, and that 87 percent would prefer to shop in stores with “touchless or robust self-checkout options.”
The idea for a voice-enabled drive-thru seems well timed, if only to cut down on wait times — the snaking line of cars extending from Starbucks’ and McDonalds’ drive-thru windows seems a temporary hallmark of the era, which technology is equipped to solve.
Order Speed Helps With Social Distancing
Rob Carpenter, who is CEO and founder of Valyant AI, said the ordering process, in many ways, mirrors a traditional one. A sensor detects the car as it approaches. The customer pushes a button on the kiosk to initiate the conversation, and a human-sounding voice greets them and takes their order. Microphone arrays help triangulate the speaker’s location, and speech recognition software identifies contextually relevant phrases: “a number two,” for instance, or a “medium diet Pepsi.”
Once the order is completed, the customer scans a QR code or swipes to pay with a credit card, and the transaction is pushed to the restaurant’s point-of-sale system. The whole input process, depending on the extent of the order, takes about 15 seconds.
“Convenience is part of it, but it’s also paramount with the pandemic to make people feel more comfortable.”
From a user experience standpoint, Carpenter told me, the goal is not only to improve the speed and efficiency of ordering but to limit the risk for viral transmission. “Convenience is part of it, but it’s also paramount with the pandemic to make people feel more comfortable,” he said.
Valyant AI, in partnership with another Colorado firm, KIOSK Information Systems, plans to roll out the technology at restaurant drive-thrus over the next two months. Over time, eye-tracking software could be used to make the drive-thru exchange completely contactless, but the technology’s real value, Carpenter said, could emerge once it’s introduced at kiosks in dining rooms, hospitals and retail stores.
“You have fewer people milling around and standing in lines, while people swipe through menus,” he said. “The quick processing helps with social distancing.”
Human Contact Just Got Really Expensive
Experts like Mark Webster, director of product (voice UI/UX) at Adobe, say Valyant AI’s talking kiosk is just one example of the ways in which conversational AI is becoming more pervasive. While personal assistants like Amazon Alexa and Google Home are making their way into people’s homes, companies like Salesforce and Adobe are incorporating voice-enabled interfaces into their business platforms to streamline data input and retrieval.
Meanwhile, Apple Pay and Square are allowing for cashless payment at checkout, and Amazon, which already offers cashierless checkout at its Go stores, recently applied for a patent for palm recognition technology that, according to Vox, would identity people “by characteristics associated with the palms of their hands, including wrinkles and veins.”
Arnobio Morelix, chief innovation officer at Startup Genome in San Francisco, who is at work on a book about post-pandemic economic trends called The Great Reboot, said interest and investment in touchless interfaces is not just a short-term blip. Citing data from the Office for National Statistics, he said internet sales in the United Kingdom now represent 30 percent of all sales, up from 20 percent a year ago, and the curve has seen a dramatic uptick since the start of the coronavirus.
“So I think the big thing we’re seeing happening now, and we’ll continue to see happening, is that something that was really cheap and easy to do, human-to-human contact, got really expensive, right?” Morelix said. “And risky. As this human touch gets riskier, people are switching to anything that you can do contactless. I think some of the first waves of this has been in this bridge from in-person to online retail.”
Later waves, he said, could touch on anything from mass transit terminals to offices and manufacturing plants.
“Not just a smartphone. But how my AirPods interact with my watch and my phone, in order to communicate digitally and through voice.”
Experts like Webster and Morelix believe we are on the verge of a massive shift in user experience design and product development. The shift, they say, will open new opportunities for UX and UI designers, as well as product managers, whose jobs will focus more and more on voice interfaces, not as standalone platforms, but as part of interconnected mobile ecosystems.
“Not just a smartphone,” Webster said. “But how my AirPods interact with my watch and my phone, in order to communicate digitally and through voice.”
As this shift takes place, said Dwayne Samuels, a product manager turned CEO at Samelogic, new UX and UI design roles will emerge in three critical areas.
- Making human-computer interactions more personalized and pleasant.
- Replacing or augmenting menu-based touchscreen interfaces with conversational inputs.
- Tracking engagement as swipes and clicks are phased out and systems become more adaptive through machine learning.
Voice Interfaces Are Still the Wild West
The field of voice interface development is so young and rife with questions, Webster said, that best practices have yet to be conceived. Designers will need to decide how systems will input voice commands, how automated speech or visual cues can deliver responses, and how personified they want interfaces to be.
“Whether or not you make a voice interface conversational is itself a design decision,” he said. “You could borrow the metaphor of having a conversation with an entity, a person or a voice assistant. Or it could be something like the Spotify mobile app, where you’re using voice as the form of input, but all the results are visual.”
Much work has already been done, Webster said. Speech recognition software has become adept at deciphering and transcribing voice inputs; computer-generated voices can faithfully replicate the timbre and intonation of human speech; conceptual models for structuring user flow already exist.
“I can close my eyes and I can think of what Gmail looks like, and what happens when I click ‘New,’” Webster said. “There’s just a whole bunch of mental models that designers have to latch onto and a bunch of those come from non-voice interfaces.”
But Webster sees a big gray area when it comes to natural language processing: how artificial intelligence “extracts the intent of what somebody is trying to do and then has a response to it.”
That’s the harder part to solve. But putting more work into the design of the voice experience itself — how pleasant and human-sounding it can be — might be able to alleviate some the engineering pain of trying to improve deep learning, Webster said. Even tiny verbal signals can lead to beneficial feedback loops.
“Whether or not you make a voice interface conversational is itself a design decision.”
“When Alexa first came out, before you could pick a default music player, I remember my two-year-old daughter would ask it to play something ‘on Spotify’ because she would hear the feedback of saying ‘X is playing on Spotify,’” Webster said. “She knew ‘on Spotify’ was a thing you need to add to the request in order to play it properly.”
Webster admits that onboarding to voice interfaces takes time, but he said concerns over the coronavirus could make people more willing to invest it.
“Once you onboard somebody, and they’re really fast at it and never have to fumble through TV screens and manually enter a Salesforce record they see the value,” Webster said. “So the idea is voice could be way more ubiquitous than it is right now, with today’s technology, if there was a greater focus on the design of these things.”
Contactless Payments Are Here to Stay
If touchless interfaces gained momentum in 2015 after Google Head of Design Strategy Golden Krishna waged war on shallow, time-sucking apps in The Best Interface Is No Interface, their more recent progress boils down to free-market economics.
Morelix, in an excerpt of his forthcoming book published in Inc., points out that the coronavirus pandemic has triggered an economic destabilization that has made it more expensive to deliver goods and services around the world — especially those involving touch.
“The biggest cost involved is risk of infection. Right?” Morelix said. “People are avoiding getting out of the house as much as they can and they are avoiding touching things once they do get out.”
While he forecasts the economic recovery happening in stages over 18 months, the early effects will leave a lasting impact, as substitute goods and services replace those that existed before, much like lowered computing cost ushered in a switch from “chemistry-based to digital photography” and spurred increased demand for complementary goods, “such as nice monitors.”
“A supermarket or a restaurant that started offering contactless payments, for example. They’re not going to delete that alternative just because social distancing is no more, right?”
Thus, as we emerge from social distancing, Morelix explained, the demand for contactless payment via apps such as Apple Pay and Square, now accepted at supermarket chains like Publix may decline, but it won’t go away entirely.
“A lot of things that got implemented will continue on,” Morelix said. “You know a supermarket or a restaurant that started offering contactless payments, for example. They’re not going to delete that alternative just because social distancing is no more, right?”
Sentiment Analysis and Empathy
One of the people keenly interested in how voice interfaces evolve is Samuels, who was living in Jamaica and had recently launched a now-dissolved startup when he first met Krishna, who was working at Zappos at the time.
Crediting Krishna’s influential book as the basis for much of his thinking, Samuels talks about something called “sentiment analysis,” a term to describe a system that uses cameras and machine learning to track people’s movement through stores and detect their baseline emotional reactions to particular products.
As Amazon describes it in a blog on its website, “it is possible to derive insights from customer behavior (i.e. which area of the store is frequently visited), demographic segmentation of store traffic (i.e. such as gender or approximate age) while also analyzing patterns of customer sentiment.... [F]or example, to get insights into how customers respond to brand content and signage, end cap displays or advertising campaigns.”
Or as Samuels summed it up: “There’s a lot of image recognition and emotional recognition happening. If you pick up a product and if there is a slight degree of displeasure or delight, [the system] can actually detect it,” he said.
Where user experience designers could play a pivotal role in the development of such technology, Samuels said, is ensuring the software is catered to human needs — not the other way around. He points to a 13-step car-unlocking app, which Krishna pans in his book, as a design failure that could have been avoided with more empathy for the user.
If you own a Model X and have the Tesla app, on the other hand, “you walk up to the car and the door opens by itself,” Samuels said.
“What digital experiences does it impact? I think it impacts everything.”
But even given the benefits of frictionless experiences, there are still barriers to wide-scale adoption of touchless interfaces, one of which is authentication. Automated speech recognition (ASR) has to be highly precise to ensure secure transactions at a bank, or verify the driver of a car, and it is not there yet, Samuels said. He estimates that ASR works with 92 to 93 percent accuracy. Facial recognition, by comparison, is around 98 percent accurate, but “if you’re twins, the data itself is very fungible.” And with no taps or clicks to track, and potentially shorter end-to-end purchase times, designers will have fewer data points by which to evaluate the customer experience.
The designers who succeed in the coming era will be those who can solve problems by talking to people and understanding their needs. Ultimately, Webster said, voice is just another interface, “a form of interaction, just like gestures, just like taps, just like clicks, just like swipes. When you think about it that way, what digital experiences does it impact? I think it impacts everything.”