I’m a senior experience designer at Adobe who works with voice, and I believe most experience/UX designers—and those who don’t mind being mistaken for one—will be designing for voice within the next few years. As we have seen with Siri, Alexa, and Google Assistant, voice as an interface is finding its way into more parts of our everyday lives. We are increasingly aware of and comfortable with this manner of interacting, and that is opening doors for voice technologies in more of our consumer products. However, using voice interfaces can be a frustrating experience. Having tools to help us navigate the abstractions—and more importantly, making these tools accessible to a bigger crowd of people—will help us pave the way for more, and more useful, voice interfaces. Designers are critical to making this happen, by bringing their skills to the table and shaping these interfaces—interfaces whose presence and impact in our lives will only continue to grow.

Drown me in your iterative process

When I was a design student, I got to spend a summer at Bauhaus in Dessau, Germany. A highlight for me from that time was roaming between the woodshop and the classrooms, literally in the legacy dust of the Bauhaus thinkers: a legacy well known for uniting the artist’s constructive and iterative process with functional design. As a design student rolling up my sleeves, ready to learn, I was soaking up all of it.

1932 Bauhaus black and red color exercise.

Black and red color exercise from Harvard Art Museum.

I remember sitting in the back of the classroom thinking, “Got it. Form follows function,” as a rule and a statement. Hearing about functionalism and craft coming together in this almost artisanal approach to creating—and more importantly for me, learning what every single Bauhaus student before me had learned—led me to internalize that the key to good design was in the iterative process. As with Johannes Itten’s color studies and Mies Van Der Rohe’s steel tube studies, I learned that the designing process is as much about getting familiar with the nature and limitations of the form and materials of the design as it is about creating. I learned that forming an understanding of the constraints while creating leads to more functional and useful design, and good design is the outcome of a lot of iterations. 

1925 Detail of a studio wing balcony on a Bauhaus building.
Bauhaus Building, Dessau, 1925-26: Detail of studio wing balcony from Harvard Art Museum.

This is something I have seen to be effectively true over and over again in my daily work as a product designer. The only difference now is that the forms of my designs are mostly 1s and 0s, and function is a little less linear.

In 2015, I embarked on my first voice-design project, knowing little to nothing about the nature of voice—its form, its limitations, or even what voice as an interface was supposed to be. Alexa had just entered the market that year, all dressed in black and standing tall around US homes, charming folks with dad-jokes and weather predictions.

As a veteran of screen-based design, I knew that voice paved the way for a number of use cases that screen-based design couldn’t accommodate, like for those interacting with digital products without the aid of touch or sight, whether because of injury, disability, or circumstance—we all want to ensure drivers keep their eyes on the road, not a screen. 

Microsoft inclusive design graphic illustrating the types of sensory impairments designers should consider.

Image by Microsoft Inclusive Design Manual

But it was very hard for me to find that sweet iterative process, not to mention that I had no examples for what good voice design sounded like.

What Is a Voice Interface?

Voice as an interface allows us to reduce the abstraction between human and computer, something other interfaces have only been able to do in tiny increments in the past. And people are eager for the advances: Amazon has sold about 100 million Alexa devices, which have access to around 70,000 “skills,” and Google has sold around 52 million Google Home devices, which offer almost 5,000 “actions.” Both Amazon and Google have opened up for third-party developers to create custom skills/actions—with apparently great success. 

Now, if we are being completely honest, most of the 70,000 skills currently out there aren’t that useful—probably because creatives haven’t been part of building them. Most of these skills have suffered from the lack of an iterative process. As we learned with the first wave of mobile design, prototyping an experience before it’s handed off to developers allows us to identify problems with usability and make the product better before it’s too far gone. Given the current pervasiveness of voice interfaces—and their continual, rapid expansion—the need for UX designers in the voice arena is only growing. 

When you are designing a voice experience, you are designing what the user needs to say and how the system will respond. The typical way to build a custom skill for Alexa today is:

  • Set up a developer account with AWS
  • Set up the infrastructure with a back end
  • Assign the utterances in the Alexa developer portal
  • Assign intent
  • Test the skill
  • Submit for approval
  • Deploy

If we break this down, we are essentially building a small conversation consisting of intents and utterances.

An intent represents the substance of the user’s interest:

An utterance is a way to express a request for information:

“I want to travel from New York to Texas.

An utterance is a way to express a request for information:

“How do I get from New York to Texas?”
“How do I go to Texas from New York?”
“What’s the best way for me to get to Texas from New York?

The usability of your voice experience is defined by how well your users can express their intent, and how well the utterances represent a language your users will use. The good news is, designers have a familiar and fail-proof process for figuring it out: iterate!

There is no magic word or secret sauce. As in other design arenas, iterations are the best way to improve voice design, the best way to home in on your intents and utterances, and the best way to increase the usability of your voice experience before you hand it off to developers. Of course, there are a few more things to keep in mind when designing with voice.

Pixel-pusher-turned-voice-designer seeks same

In order for us to see the real impact voice can have on our digital lives, we need to invite more creatives into the process. We need more designers, makers, and writers to familiarize themselves with the nature of voice—to form opinions, create experiences, and, more importantly, break experiences.

Albert Szabo, Exercise in color and shape relations, c. 1945.
Albert Szabo, Exercise in color and shape relations, c. 1945 from Harvard Art Museum.

Voice design gives designers opportunities to be on the frontlines of some of the most pressing issues facing the world today: the early wave of voice interface has shined a light on privacy concerns, stereotypical gendered representation in voice assistants, and different ways of interacting with our daily technology. These issues are not going away anytime soon—the voice wave is moving fast, but we’re still in the early days and have a lot to figure out. Prototyping voice experiences allows us to invite more people, different perspectives, and diverse skill sets into our “what-if” worlds, and to illuminate and talk about these complexities before products are deployed.  

I’m not sure what Walter Gropius, one of the founders of Bauhaus, would have thought about natural language being used as a way to navigate our digital whereabouts, but I think following his lead by putting bau (German for “to build”) at the center is what we have been missing in voice. As we continue to build interfaces to reduce the abstraction between human and computer, the building is not likely to get simpler. Design skills are essential in the process of building. Even more important is inviting a bigger pool of people into the building process. Having accessible tools that allow us to design and prototype these abstractions are crucial to voice fulfilling its immense potential.