Illustration by Avalon Hu

Designing for just one specific device or input type is no longer enough. These days, designers have a myriad of technologies at their disposal and can create UX/UI experiences across all kinds of devices: smart speakers, wearables, VR and AR headsets, etc. Some have touchscreens and some are screenless, and many share capabilities, requiring us to think more deeply about how we want people to interact with the products or services we’re building. 

Designer Cheryl Platz has worked on a wide variety of emerging technologies and groundbreaking products like Amazon’s Alexa, Microsoft’s Cortana and Windows Automotive, and early titles for the Nintendo DS. She believes that we need to take into account multiple inputs and outputs from the start to create seamless cross-device experiences and do so by putting the customer at the center. Now, she has distilled her thoughts into her book, “Design Beyond Devices”.

We sat down with Cheryl to find out what inspired her to write the book, how to design multimodal experiences, why customer context is key, and more.

Cheryl Platz’s book, Design Beyond Devices, published by Rosenfeld Media.
Cheryl Platz’s book, Design Beyond Devices, published by Rosenfeld Media.

What are multimodal experiences, and how do you design for them?

Multimodal design seeks to coordinate the delivery of multiple input and output stimuli to create a coherent customer experience. 

Humans are multimodal. We communicate and work with various modes of communication all the time. But we haven’t been orchestrating them in a thoughtful way to get the most out of them yet, which is a shame because our devices have become much more capable in the last few years. We’ve gone from physical communication – keyboard and mouse – to voice, touch, and even gestures. We already have these technologies in our toolbox. 

Multimodal design adds a layer of design rigor on top of modality-specific designs, such as graphical or voice UI design, but it’s early days. We are going to have to develop a base level of understanding, so that we can start building platforms and best practices for multimodal experiences. Eventually, some people will specialize in the really hard multimodal questions but right now we’re at a very exciting beginning stage. 

What inspired you to write the book?

My background in multimodal experiences goes all the way back to video games. I worked on a launch title for the Nintendo DS in 2004, which supported speech and touch, and on Windows Automotive in 2012. But a lot of the ideas that drove the core of “Design Beyond Devices” came from my time at Amazon. I worked on a now retired multimodal product called the Echo Look, which you could interact with via touch using a traditional smartphone or via voice. I ended up moving to the Alexa team, and my first project was to design the original Alexa notification system. 

When we were working on the Echo Show, the designers were having some really complicated conversations. Amazon had just blown everyone’s minds with the voice-only Echo and now they were adding a screen to it! We asked existential questions like “should it talk as much as the Echo?”. At the same time we were adding Alexa support to the Fire TV, and it turned out the answers to those questions weren’t always the same. 

Everyone was so busy getting these devices to work that they didn’t have the time to think about this as a system and rationalize the interaction models. I kept expecting someone in the industry to take a position on it, but no one ever did. Eventually, I started writing down my thoughts, which was really cathartic. This book is an attempt to try to help bring clarity on the multimodal design process. It’s intimidating to start on a blank page when you’re working on a device that can do everything.

Selected frames from an early Echo Look storyboard showing important environmental details pulled from contextual inquiry.
Selected frames from an early Echo Look storyboard showing important environmental details pulled from contextual inquiry.  

Can you talk us through the four key themes that are woven through the whole book?

Absolutely. The four themes are customer context and ethics, multimodal frameworks, ideation and execution, and emerging technologies. 

Customer context and ethics involves looking at the broader picture and the world which your experience is going to live in. A lot of the multimodal technologies we work with tend to be powered by AI, which is often biased. There’s much potential for accidental harm, and therefore it’s very important to consider ethics – with great power comes great responsibility. As people change between different devices, it’s also important to understand why they do it. Is it noisy? Did a kid just run into the room? Are their hands covered in butter? What’s happening in the user’s space that motivated the change?

These days we need to go deeper with our customer research. Context matters. It defines our platform choices and whether or not to ship the product in the first place. 

Multimodal frameworks then explore how multimodality might be applied to your project and translating customer behavior patterns into models that are abstract enough that you can represent them in code. For example, computers shouldn’t interrupt you when you’re in the middle of typing a sentence. There are so many signals it has that you’re busy – it’s connected to a keyboard and knows you just pressed a key, for instance – yet it still switches focus and pulls you to another window. It’s because your device doesn’t understand your behavior and that it should wait until there’s a pause.  

Ideation and execution is about taking these big ideas and turning them into reality in a way that development partners can understand. This includes multimodal user flows and exploring how to expand existing design systems to account for multimodality.

Finally, I cover emerging technologies like natural language interfaces, VR, AR and XR – to ensure everyone, regardless of their experience, is at the same baseline.

A pattern language for multimodal flow diagrams, adapted from Cheryl Platz’s work on Windows Automotive and the Alexa Voice UI design team.
A pattern language for multimodal flow diagrams, adapted from Cheryl Platz’s work on Windows Automotive and the Alexa Voice UI design team.

How do you ensure that you’re designing a seamless and consistent experience?

There’s so much going on in the backend to create fluid experiences. One way to futureproof ourselves is by thinking more humanistically and less about a specific device and its capabilities. If you design for general modalities instead of specifically focusing on one device scenario, you’ll be able to apply your solution to emerging devices more easily.

We need to think about how a human wants to interact with a device. What kind of phrases make sense for voice, for example? Start from a more holistic perspective and assume that the one constant is change. Luckily, we can reasonably assume that humans are unlikely to develop a sixth sense in the near future. We can use their existing senses as a basis to explore our devices’ communication modalities. 

How do you capture customer context?

In my free time, I’m a professional improv performer and teacher. People assume I’m going to be compelling very quickly on stage. To actually achieve that and tell better stories, in my theater we have a shorthand called CROW, which stands for character, relationship, objective, and where. This framework can also be applied to finding out more details about our customers and designing a better experience for them. 

CROW helps us challenge our assumptions and get a well-rounded perspective on the customer and then share what we’ve found out during customer search with our stakeholders. They might think they understand everything about your customers, so tools like CROW, along with my worksheets on capturing customer context, help us engage our stakeholders and get buy-in.

How do you decide which input modalities are right for your product or service?

In Chapter 7, I talk about the Spectrum of Multimodality, which helps you figure out what approach to multimodality will make most sense for your customer based on two key aspects: their proximity to their device(s) and the amount and density of information that’s being communicated. Voice, for example, is more suited to lower communication density, while visuals are suited more to higher communication density. 

If you can define these two dimensions, then you can start making some informed decisions and narrow down the choice of potential inputs to support. You might discover, for example, that your customers only really need a little bit of information but they’re not always near a device. Or that they need a lot of information and they have a device with them all the time. Once you’ve figured out which model is appropriate for your scenario, you bring the context back in. Are the customers going to be interested in using voice, for instance?

The spectrum for multimodal interactions ranges from Adaptive, to Anchored, to Direct, to Intangible. Quadrants are characterized by a user's proximity and physical engagement with the modality as well as the richness of information being communicated.
The multimodal interaction model spectrum.

How do you prototype multimodal experiences?

Test desirability first – long before you have a fully functional experience. It helps you figure out whether or not it’s even worth building these expensive multimodal scenarios. I’m a big proponent of the minimum amount of effort and fidelity you need to answer that question. You can do the fine-tuning later. Especially with voice recognition, it’s not that hard to move from the Wizard of Oz method to a functional prototype, now that we have tools like Alexa Skills, Google Actions, and Adobe XD. There are lots of ways for you to prototype multimodal experiences at a level of fidelity that matches the complexity of your product or service. 

How do you see multimodal experiences evolving?

The future is multimodal. The more we can move to supporting as many modes as possible, the better, so that we don’t leave people behind. If you choose not to support voice on a smartwatch, you’re going to exclude people that don’t have access to sighted abilities for any reason. It could be that there’s too much glare and they can’t see what’s on the screen, or it could be that they’re blind and need a spoken interface. 

“Holy grail” experiences allow customers to choose the interaction model based on their needs but that’s still a bit expensive and complicated. I like to challenge folks to think about being flexible and support more input modalities than they do today, as a way to expand their market and be more inclusive. It makes the device more desirable both for people with permanent disabilities and people with temporary disabilities. 

Cheryl Platz argues that multimodality maximizes inclusiveness, on stage at the 2018 Amuse UX conference.
Cheryl Platz argues that multimodality maximizes inclusiveness (Image credit: Amuse UX). 

Apart from buying your book, how can designers get started with multimodal design?

A lot of designers are going to be trained in one modality, maybe two. If you’re working solely in graphical user interfaces, the first step is to start getting curious. Watch people interact with multimodal experiences and pay attention to the transitions. Pay attention next time you have to interact with a voice activated phone system, especially if it offers to contact you via text or email.

A big part of being a multimodal designer is noticing the gaps and what happens when you move between devices, network connections, and modalities. If you move from a desktop to a mobile site, or from touch to voice, for example, how does that affect you? What decisions do you have to make? Designing smooth transitions between modalities is really important and having that awareness and sensitivity will go a long way. 

To see more of Cheryl Platz’s work and insights, pick up your own copy of “Design Beyond Devices”.