With the increasing popularity of smart speakers such as Amazon Echo and Google Home, the ability to design well for voice interfaces has become more relevant than ever before. According to reports from Edison Research and NPR, one out of six Americans currently owns one of these voice-controlled smart devices.
Not only is that an impressive number (approximately 40 million people), but the rapid adoption rate is impressive, too. The same report reveals that smart-speaker ownership jumped 128 percent in only one year, from January 2017 to January 2018.
User experience (UX) designers should pay attention to this trend because it represents more opportunities for them career-wise. Designers who have the know-how to design for voice interfaces demonstrate a skill-set that’s increasingly attractive to tech companies.
Here are some best practices to follow when designing for voice interfaces in English.
Know the difference between voice and graphical UI
Just because you can design well for a visual, graphical user interface (UI) doesn’t mean you’ll automatically succeed once you transfer over to voice-UI design. While they may be similar, they’re hardly the same.
One of the immediate key differences is the absence of affordances and signifiers, which help users figure out a UI with greater clarity. Since both are visual elements, that’s where the challenge comes into the picture when designing for voice interfaces.
Affordances are what an object (say, a UI button) can do while signifiers make these affordances clearer by closing the gap between what an object can really do versus what it’s perceived to do. If said button looks clickable on a graphical interface, then a signifier would be a design element like a drop shadow or a 3D element that clarifies to the user that its affordance (that it can be clicked) is really true.
Needless to say, that advantage isn’t present in voice UI. Another challenge is users’ preconceived notion that voice communication is mainly between people and not between people and technology.
As a designer, therefore, you should help users get around this visual-cue deficit by always having the voice UI explicitly tell them what function or feature they’re using. For example, if they’re asking for today’s weather report, ensure the voice UI tells them:
“Today’s weather will be rainy and windy,” instead of simply “Rainy and windy.”
This may seem like a negligible difference, but it’s not.
By beginning the response with “Today’s weather,” the voice UI reassures users that they’ve accessed the weather app. Unlike a graphical UI, where users can actively see that they’re using the weather app (thanks to affordances like weather icons), a voice UI needs to be as explicit as possible to avoid any misunderstandings.
Understand your users’ notions of voice communication
People often throw in an idiom or two when communicating with others (like my use of “throw in” there). They don’t explicitly state what they mean, but others in the conversation will know what they mean because they understand the context from past experience and a command of the language.
Designing for voice interfaces, on the other hand, relies on basic, stripped-down, and crystal-clear language. In other words, the UI has to be designed so that your users have to explicitly tell the device what they want from it.
For example, from the so-far complete list of Alexa commands, we see that you can tell your Amazon Echo to open an app by saying:
“Alexa, open Uber.”
The command doesn’t rely on a figure of speech like:
“Alexa, fire up the Uber app.”
Idiomatically, people understand that both phrases mean the same thing — you want to have the smart speaker open a specific app for you. Practically, however, voice interfaces are prone to misunderstanding expressions like this. It’s simply impossible (at least at this point) to program into a voice UI all the numerous contexts and assumptions in the English language that would make it accurately understand what you really mean.
Bottom line: Avoid idioms. Use simple, straight-to-the-point language.
Normalize the language voice assistants understand
Designers can encourage users to use commands that voice UIs can understand — and get around the problem of over-reliance on idioms — by normalizing the use of clear and full intentions in voice commands.
For instance, if you have a written user guide accompanying your smart speaker, present numerous examples where a user’s voice commands use clear and straight-to-the-point requests. Likewise, when your voice UI answers in response to a user’s question or prompt for additional information or help, those responses should be in complete and direct expressions as opposed to idioms or slang.
In this way, by setting the example of how to interact with your smart speaker, you’re training the user to issue commands that are in-line with how the system understands language.
Build in personalization
One of the similarities between voice and graphical UI is today’s focus on personalization. No user is fond of repeatedly inputting, whether by voice or typing, the same information about them over and over. They expect the device to simply remember.
For great UX, UIs have to provide a quick and painless experience with little or no friction. This holds true with voice interfaces.
For example, each time a user places an order on Amazon with Alexa they don’t have to actually say their shipping address and payment information into their Echo device. That’s because Alexa automatically remembers this information from the user’s account and uses it as the default with each voice purchase.
This may seem like a small detail, but it’s a hallmark of personalization, and UIs that do the small things right are often the ones that delight their users.
Bottom line: Ensure that apps that use voice UIs don’t prompt users for unnecessary repetition of details that the system should already know, so that basic tasks like purchases are frictionless.
Tell users what their options are
Voice UIs don’t neatly provide users with a clear, visual pathway about their options the way graphical interfaces do. For instance, on your iMac, you can quickly click on Launchpad to pick your app of choice, open said app, and then receive additional visual options for what you can do once inside the app.
An interface that relies solely on voice is automatically disadvantaged in this respect — but your users don’t have to be.
To get around this handicap, design your voice UI to be upfront with your users about what they can choose, right from the get-go. This will avoid confusion and frustration.
For instance, have a smart speaker interact with users by telling them what they can do within the app they’re accessing. In the case of users asking for a weather forecast, design the voice UI to tell them that they can either:
- Have today’s forecast,
- An in-depth, 48-hour forecast,
- A snapshot of the weather for the next two weeks, or
- Exit the app.
This eliminates the danger of users wasting their time and running into friction by asking the voice UI something that it wasn’t designed to do.
Put the users first
Designing for voice interfaces comes down to designing for the users above anything else. That’s why you want to deeply think about the nature of human conversations and include this reality into any voice UI.
Any quirk that doesn’t mimic or otherwise replicate human conversations — with their idiosyncrasies and limitations — will negatively impact the UX.
But any voice UI designed to interact with people as they do with other humans — even if it’s somewhat limited — will still provide excellent UX.