There have been some incredible advances in artificial intelligence and machine learning in the last few years, and AI is increasingly making its way into mainstream product design. However, from an over-reliance on female voices in virtual assistants, such as Amazon’s Alexa and Apple’s Siri, to policing software that predicts crimes to happen in neighborhoods that are poor, and facial recognition software that only really works for white people, there have also been some very concerning issues around bias in AI. The training data crawled by learning algorithms, it turns out, is flawed because it’s full of human biases.
“It’s not the intelligence itself that’s biased, the AI is really just doing what it’s told,” explains content strategist David Dylan Thomas. “The problem is usually that it’s biased human beings who are providing the data the AI has to work with. The AI is just a prediction machine. It frankly makes better predictions than a human could based on the data it’s being given.”
Market and UX research consultant Lauren Isaacson agrees and says that we need to take greater care of what we feed to the robots: “AI is no smarter than the data sets it learns from. If a system is biased against certain people, the data resulting from the system will be no less biased.”
So whether we use machine learning algorithms that are based on training data or hard-code the language of digital assistants ourselves, designers bear a great responsibility in the creation of AI-powered products and services.
In this two-part article, we explore the challenge and hear from UX designers, user researchers, data scientists, content strategists, and creative directors to find out what we can do to reduce bias in AI.
Educate and check yourself
The first step to removing bias is to proactively look out for it and keep checking your own behavior, as a lot of bias is unconscious.
“We cannot build equitable products — and products that are free of bias — if we do not acknowledge, confront, and adjust the systemic biases that are baked into our everyday existence,” explains Alana Washington, a former strategy director on the Data Experience Design team at Capital One, who also co-founded a ‘Fairness in AI’ initiative at the company. “The ‘problems’ that we look to solve with technology are the problems of the world as we know it. A world that, at the moment, is stacked to benefit some, and prey upon others.”
To change this, Washington recommends expanding our understanding of systemic injustice, considering how marketing narratives have been sublimated into our collective belief system, and actively listening to as many diverse perspectives as possible.
“We must shift from an engineering disposition, building solutions to ‘obvious’ problems, to a design disposition — one that relentlessly considers if we’ve correctly articulated the problem we’re solving for. Joy Buolamwini puts it best.”
Why we code matters, who codes matters, and how we code matters.
Joy Buolamwini
Build a diverse team
As the AI field is overwhelmingly white and male, another way to reduce the risk of bias and to create more inclusive experiences is to ensure the team building the AI system is diverse (for example, with regard to gender, race, education, thinking process, disability status, skill set and problem framing approach). This should include the engineer teams, as well as project and middle management, and design teams.
“Racial and gender diversity in your team isn’t just for show — the more perspectives on your team, the more likely you are to catch unintentional biases along the way,” advises Cheryl Platz, author of the upcoming book Design Beyond Devices and owner of design consultancy Ideaplatz. “And beyond biases, diversity on your team will also lend you a better eye towards potential harm. No one understands the crushing impact of racial bias as well as those who have lived it every day.”
Carol Smith, senior research scientist in Human-Machine Interaction at Carnegie Mellon University’s Software Engineering Institute, agrees that diverse teams are necessary because their different personal experiences will have informed different perceptions of trust, safety, privacy, freedom, and other important issues that need to be considered with AI systems.
“A person of color’s experience with racism is likely very different from my experience as a white woman, for example, and they are likely to envision negative scenarios with regard to racism in the AI system that I would miss,” she points out.
Redefine your process to reduce harm
Having diverse teams also helps when you start implementing harm reduction in the design process, explains machine learning designer, user researcher and artist Caroline Sinders.
“We should have diverse teams designing AI in consumer products,” she says, “so when we start to think about harm, or how a product can harm and go wrong, we aren’t designing from a white, male perspective.”
While team diversity is crucial, you’ll never be able to hire a group of people that completely represent the lived experiences out there in the world. Bias is inevitable, and Cheryl Platz therefore advises that you must also redefine your process to minimize the potential harm caused by your AI-powered system, and develop proactive plans that let you respond to issues and learn from input as fast as possible.
She calls this new mindset “opti-pessimism“: be optimistic about the potential success of your system, but fully explore the negative consequences of that success.
Carol Smith advises that the team needs to be given time and agency to identify the full range of potential harmful and malicious uses of the AI system.
“This can be time consuming,” she admits, “but is extremely important work to identify and reduce inherent bias and unintended consequences. By speculating about harmful and malicious use, racist and sexist scenarios are likely to be identified, and then preventative measures and mitigation plans can be made.”
Caroline Sinders agrees and suggests to always be asking ‘how can this harm?’ and create use cases from the small to the extreme. “Use cases are not edge cases,” she warns. “If something can go wrong, it will.”
Sinders also recommends asking ourselves deeper questions: “Should we use facial recognition systems, and where does responsibility fit into innovation? Having a more diverse data set and a more diverse team doesn’t make the use of facial recognition any more ethical or better. It just makes this technology work better. But when it’s implemented into society, how does it harm people? Does that harm outweigh the good?”
In this particular case, Sinders points out, the harm does outweigh the good, which is why cities like Oakland, Somerville, and San Francisco are outlawing the use of facial recognition in public spaces or used by bureaucratic or governmental entities and offices, such as police departments.
Conduct user research and testing
One way to help data scientists and developers look beyond the available data sets to see the larger picture is involving UX research in the development process, suggests market and UX research consultant Lauren Isaacson.
“UX researchers can use their skills to identify the societal, cultural, and business biases at play and facilitate potential solutions,” she explains. “AI bias isn’t about the data you have, it’s about the data you didn’t know you needed. This is the reason qualitative discovery work at the beginning is crucial.”
Isaacson says if we will be handing over life-affecting decisions to computer systems, we should be testing those systems for fairness, positive outcomes, and overall good judgment.
“These are very human traits and concerns not easily imparted to machines,” she warns. “A place to start with how we define them. If we can agree on how they are defined, then we can find ways to test for them in computer programs.”
Define fairness
Defining fairness in machine learning is a difficult task for most organizations, as it’s a complex and multi-faceted concept that depends on context and culture.
“There are at least 21 mathematical definitions of fairness,” points out senior tech evangelist for machine learning and AI at IBM, Trisha Mahoney. “These are not just theoretical differences in how to measure fairness, but different definitions that produce entirely different outcomes. And many fairness researchers have shown that it’s impossible to satisfy all definitions of fairness at the same time.”
So developing unbiased algorithms is a data science initiative that involves many stakeholders across a company, and there are several factors to be considered when defining fairness for your use case (for example, legal, ethics, trust).
“As there are many ways to define fairness, there are also many different ways to measure and remove unfair bias,” Mahoney explains. “Ultimately, there are many tradeoffs that must be made between model accuracy versus unfair model bias, and organizations must define acceptable thresholds for each.”
To help detect and remove unwanted bias in datasets and machine learning models throughout the AI application lifecycle, IBM researchers how developed an open source AI Fairness 360 toolkit, which includes various bias-mitigation algorithms as well as over 77 metrics to test for biases.
Another useful resource are Google’s AI principles and responsible AI practices. The section on fairness includes a variety of approaches to iterate, improve and ensure fairness (for example, design your model using concrete goals for fairness and inclusion), and there’s also a selection of recent publications, tools, techniques, and resources to learn more about how Google approaches fairness in AI and how you can incorporate fairness practices into your own machine learning projects.
How to tackle gender and racial bias in AI
So, educate yourself about bias (David Dylan Thomas’ Cognitive Bias podcast is a good starting point), try and spot your own unconscious biases and confront them in your everyday life. Seek out diverse perspectives, build diverse and inclusive teams, and keep asking yourself if the product you’re building has the potential to harm people. Implement this mindset right in the design process, so you can reduce risks. Also conduct user research, test your systems, define and measure fairness and learn which metric is most appropriate for a given use case.
In part 2 we will explore projects that tackle gender and racial bias in AI and discover techniques to reduce them.