How to tackle bias in AI
AI bias is the phenomenon where a mathematical algorithm expresses the prejudice of its creators or data. As AI is used increasingly across industries, different biases and their subtle consequences are being discovered at an increasing rate. It is important for us to understand what these biases are, how they form, and how we can avoid or minimize their potential for unethical decisions.
KRISTOFFER GORDON CLAUSEN | DATA SCIENTIST | NOVEMBER | 2020
The underlying sources of bias in AI
To understand the problem of bias in AI, we have to understand the two different definitions of bias.
The common definition of bias:
1. “Bias is disproportionate weight in favor of or against an idea or thing, usually in a way that is closed-minded, prejudicial, or unfair.” [source 1]
In statistics, bias is something slightly more technical and with an all-important difference:
2. “Statistical bias results from an unfair sampling of a population, or from an estimation process that does not give accurate results on average.” [source 2]
Implicitly, this means that the algorithms learn whatever patterns the creator of the data presents. In other words, AI bias is a bias that mirrors the prejudice of its creators or data, meaning that cognitive biases essentially are the root of all modern AI and data biases.
The term cognitive bias was originally introduced in 1972 by Tversky and Kahneman, but its tendencies have been around forever. Figure 1 provides a comprehensive overview of the various cognitive biases. These can primarily be split into four high level categories, which are considered the drivers of their respective biases. These categories include Too Much Information, Not Enough Meaning, Need to Act Fast, and What Should We Remember? It’s out of these categories that modern AI biases arise. In other words, biases in AI arise from human biases.
Figure 1: Overview of the cognitive biases, broken up in its four prominent groups [source]
AI algorithms may depend on one or several data sources containing human decisions or on data that reflects second-order effects of societal or historical inequities. In many cases, this data can be biased toward results or decisions that are unethical. It’s usually these underlying data sources, rather than the algorithm itself, that are the main source of the issue.
We can easily conjure up a real example of how statistical bias leads to the common notion of bias.
Just last year, Apple found themselves in an ongoing scandal of gender discrimination with their Apple Card issued by Goldman Sachs. It all started when a software developer wrote on Twitter that he received a credit line 20x higher than his wife, despite the fact that they filed joint tax returns and that she had a higher credit score. This led to an abundance of other married couples coming forward to share very similar stories, followed by a frenzy in the media, and an investigation by the New York State Department of Financial Services.
How could this happen? It is less likely that Apple is sexist and more likely that the data fed to the Apple Card’s algorithm to determine creditworthiness had hidden biases. Somewhere along the line, Apple’s AI algorithm learned that women are a bigger risk and, therefore, provided them with a lower credit line.
Managing bias through the fairness pipeline
It’s important to assess the task at hand prior to managing bias. Discrimination is actually preferable for certain tasks, such as medical diagnostics, where gender-specific treatments may be desirable. However, gender-specific dsicrimination is illegal by law within other tasks, such as banking-related loan applications. Bias must thus be considered relative to the task.
To overcome biases we look into the fairness in the decision making process. The induction of bias can occur at any stage in the fairness pipeline, i.e. how fairness propagates through a compound decision-making processes, which we call a pipeline, as illustrated in the figure below.
Figure: Illustration of how biases propagate through the fairness pipeline [source]
Furthermore, it’s important to note that algorithms are simply a reflection of the data that they are trained on, meaning that biases propagate through the fairness pipeline. Biased data collection, such as over- or underrepresenting certain groups, will lead to biased datasets, which in turn produces biased algorithms.
Bias is therefore advantageously managed (i.e. detected and mitigated) as early as possible in the fairness pipeline. However, bias has to be quantified before it can be managed. Several tools and metrics have therefore been introduced to quantify fairness and realize the management of biases.
First and foremost, bias has to be detected before it can be mitigated. Numerous methods, such as disparate impact and statistical parity, have already been popularized to statistically evaluate the fairness within a population or sample, allowing AI developers to assess and detect the biases within their data. Bias mitigation can finally be achieved through various mitigation approaches. These approaches consist of a collection of methods, each with their own unique strengths and weaknesses, applicable either in the pre, in, or post-processing stages of the development phase. There is no single method or approach directly suitable to every scenario due to their unparalleled applicabilities. A cocktail of the methods mentioned above is, therefore, reasonably applied to achieve an optimally fair solution.
Preventing bias in AI
There is currently no technical silver bullet to prevent bias in AI. There are, however, many actions you and your organization can take to minimize the risks of bias.
First and foremost, it is essential to spend time on the necessary resources to audit data properly, especially when protected attributes such as age, gender, or race are part of a data set.
But, auditing the data alone is not enough!
Bias can sneak in at multiple stages within the fairness pipeline and significantly affect the decisions made from AI. To implement unbiased models in practice, a process supported in a platform for development, deployment and monitoring is needed.
Only in an AI platform with AI Governance support, can you ensure that processes are followed and that all data- and event logging is validated and documented. With a secure and comprehensive AI Governance support you can confirm whether or not your models live up to the internal and external guidelines and regulations under AI explanatory frameworks.
At 2021.AI, implementing ethical, transparent, and trustworthy AI models across industries go hand in hand with our AI Governance support in the Grace Enterprise AI Platform. Grace serves as a powerful ally in discovering, documenting, and explaining potential biases in algorithms. With Grace, data science practitioners, end-users of AI models, and governance functions both internally and externally have clear and robust support to manage and decipher the underlying development processes, results, and models in detail.
By now, you have probably come to terms with the fact that no organization is free from the risks and consequences of bias in AI. We know that biased data collection leads to biased datasets, which in turn produces biased algorithms that invent biased results and solutions. We also know that the induction of bias can occur at any stage in the fairness pipeline, and does not fall in the data collection process alone. To tackle bias in AI best, it is in the interest of any organization to implement a robust AI Governance framework anchored in a AI platform. With the right support in place, there is less to worry over with respect to the phenomenon of bias in AI.
Kristoffer Gordon Clausen
Data Scientist, 2021.AI
As a Data Scientist at 2021.AI, Kristoffer has modeled multi-process manufacturing lines for pharmaceutical companies, predicted container dwell times for container port terminals, and deployed several models for telecom clients. Kristoffer is also a board member at Neural, a student-run society for AI in Denmark.
You might also like…
Customer churn prevention is a classical business task that can be efficiently addressed using AI. Besides predicting the likelihood of churn, AI can also provide insights into the reasons why...
Artificial intelligence is a booming field of research getting a lot of attention in the media due to its impressive applications such as image recognition or self-driving cars. The topic...
The Ethical AI Newsletter
It’s not fake. It’s not artificial. It’s real news! Sign up for our Ethical AI newsletter and get the latest AI insights from our data science and AI experts.