6 Things I Learned by Building My First Voice Prototype
With sales of smart speakers like the Amazon Echo and Google Home on the rise, and built-in voice assistants pre-installed on most smartphones, voice experiences are becoming increasingly prevalent. And the number of voice experiences is growing rapidly: Amazon reported that as of January 2019, there were over 80,000 Alexa skills (voice-based applications for Alexa). I’ve spent my fair share of time chatting with Alexa, Google Assistant, and Siri. With voice poised to become a crucial part of designing user experience, and Adobe XD offering voice-prototyping tools, it seemed more than worth the effort to get comfortable with this new medium.
When I first heard about Adobe’s internship opportunities, I was thrilled by the potential chance to be part of a company that makes the tools I use every day. In my grad work at NYU, and my work as a freelance UX designer, Adobe XD has become a particularly big part of my life. When I was accepted to be a design content intern for XD this summer, I was over the moon, but this was especially so when I was given the opportunity to explore XD’s voice capabilities by building my first voice prototype. Working on this prototype gave me insight into the huge potential (and a few of the challenges) of using voice.
My voice prototype, which you can see on Behance, is a voice-first home security skill for Amazon Alexa called HomeSafe. It is compatible with both the Echo and Echo Show, and lets the user control the lights and locks around their house using just their voice.
Here are six things I learned when making my first voice prototype:
1. Figure out why your idea makes sense as a voice-first experience
One of the things I struggled with most at the beginning was trying to isolate what made something a uniquely good voice-first experience — as in, why wouldn’t I just make a mobile app instead? Thinking about this led to some insights into where voice interfaces can really shine: in situations where the user needs to complete a task hands-free, if a user is already in bed and wants an eyes-free experience, or if a minor task has a high barrier to entry, such as when a required control is deeply nested in the settings of an app or device. This helped me narrow down the scope of my idea for my first prototype. The use case I imagined was a person in bed, about to go to sleep, and suddenly wondering if the lights were off and doors locked. Instead of getting up and going through the house to check, or fumbling in the dark for a smartphone, they could just ask Alexa to check for them.
2. Talk to your prototype as much as you can
Talking to your prototype is a great way to get a sense of what the final interaction will sound and feel like. Once I had written out my first user flow and wired up the screens, I immediately started testing it in Preview mode (with XD’s latest update, voice prototypes can now be tested directly on Alexa-enabled devices). This turned out to be a good — and efficient — decision. What I thought made sense on paper did not always translate smoothly into the voice interaction. Finalizing the user flow was much easier when I could hear that a response was taking too long, or feel that an utterance or phrase was unnatural or hard to remember. In my early flows, I would sometimes forget what I needed to say to advance the interaction or forget the options I’d been presented with, which was a good indicator to simplify the commands and shorten the device responses.
3. Embrace trial and error
When your prototype isn’t working the way you expect it to, it can actually be a good thing. It’s better for you to fail in the early stages and correct what’s wrong than have it fail when you user test it and your user gives what you think is the correct response. One of XD’s newer features that I ended up using a lot was the show notifications for voice tool when you’re previewing a prototype. The notification will tell you what Adobe XD heard; if there is no match for it, you’ll receive a notification.
This is a useful tool for a number of reasons: maybe the trigger word is “dollars” but XD is hearing “doll hairs.” Now you can add “doll hairs” as a trigger, and the prototype will work as intended. Or perhaps you were testing your prototype in a crowded space, so someone else’s conversation got picked up. Both things happened to me more than once. No matter what the issue is, the “show notifications” tool will give you insight into what’s going on and allow you to fine-tune your prototype.
4. User test early and often
This is just a general tenet of UX design, but it’s particularly important when working with voice. With a GUI, you can give a user visual hints or clues to help them situate themselves in an interaction — not so with voice. Even if you think you’ve designed your voice interaction so that users will only respond in a certain (correct) way, they’ll surprise you, so your design needs to be flexible enough to accommodate the diversity in responses. One of my favorite things about designing in XD is that it allows you to create multiple triggers for the same transition. For example, in the HomeSafe prototype, Alexa will speak to the user and say “The lights are on in the living room. Would you like to turn them off?” Originally, I had only included a trigger for “Yes,” but after user testing I found that people occasionally responded, “Turn them off” and “Turn off the lights.” I added these triggers into my prototype to ensure a smoother user experience.
5. Designing a multimodal experience will present different challenges than designing a voice-only experience
I built an Alexa skill prototype for both the Amazon Echo, which is voice only, and the Echo Show, which has an accompanying touchscreen. Working out the voice flow for the Echo was accomplished by talking to my prototype and user testing. A more daunting task was figuring out how to successfully utilize the screen to create a multimodal experience with the Echo Show. While the skill was designed voice-first, I wanted to ensure that the user had multiple ways of completing tasks. Striking a balance between what information and interactions were useful to display visually and which were redundant to what had been communicated by voice was challenging. In the end, I decided to make the actions of locking and unlocking doors and turning lights on and off input-agnostic, so that users could choose to interact with the touchscreen or just talk to the skill, and designed the Show screens to allow for tactile input.
6. Design your way out of frustration
As someone who interacts pretty regularly with smart speakers and voice-automated systems, I am accustomed to feeling frustrated with them. I’ve lost count of the number of times that I have answered a prompt only to be told by Siri or Alexa, “I’m sorry, I didn’t understand that,” or that I’ve been asked to choose from a list of options that didn’t include the one I wanted. Designing a voice prototype gave me a new perspective on the reasons why a VUI might be unable to respond correctly—did it not hear me properly? Are there enough response options encoded in the design? Did the interaction set me up to respond correctly?– and made me wonder how I could create VUIs that encourage a successful interaction between user and device. My frustration shifted into curiosity, and I’m actually looking forward to the challenge.
If we believe that voice will be the medium of the future, we have to work to ensure that our VUIs have enough affordances to create a smooth experience for users.
Moving forward, I can’t wait to explore the potential of voice design. Some topics I’m particularly excited about are building accessible applications for people with visual and mobile impairments, fostering ways to practice second language skills, and creating games to help improve cognitive functioning in elders. The flexibility of voice as a medium to do everything from telling jokes to controlling nearly every aspect of your home flings the door wide open for designers to exercise their creative capabilities. I’m ready to start building more.