FAIRNESS IN MACHINE LEARNING
This article is an examination of the legal frame and some specific nuances on fairness in machine learning, it is not an exhaustive analysis but rather an entry point to a complex problem with plenty of opportunities and challenges.
ANGEL PEREZ | JANUARY | 2019
The equity of machine learning (ML) algorithms is a growing field of research that arises from the general need that decisions are free from bias and discrimination. These requirements also apply to ML-based decision tools, where current European legislation provides a framework in which algorithmic decision-making needs to be carefully considered .
For the sake of simplicity, let us use a specific hypothetical case: An algorithm used by a bank to predict whether or not an individual will receive a loan based on the risk of default1. Some important elements in the European legal framework play together when one individual is assessed by such an algorithm [1,2], namely: his rights to not be subject to an automated decision in the first place2, his right to get an explanation of the decision and his right to non-discrimination3. This frame requires ML practitioners to produce models and workflows that – by design – take care of possible discrimination (fairness) and that are explainable to the user (interpretability), requiring a high degree of clarity and reproducibility through the whole ML workflow (transparency). Examples of research efforts and products in this direction can be seen at Google [4,5] and IBM .
In the described scenario, it is necessary to consider different definitions of fairness4 to evaluate the decision and enough information about the ML algorithm to analyze it. To this end, a set of questions can help highlight some aspects of fairness and its importance without the specifics:
IS IT ENOUGH TO EXCLUDE THE INDIVIDUAL’S PROTECTED ATTRIBUTES5 FROM ML MODELS TO AVOID DISCRIMINATION?
It is straightforward to see that a decision based on only one of those features is indeed discriminatory. However, eliminating them from the analysis does not offer a solution. In our bank loan case, assume that these features are excluded but the person’s home postal code is used instead. In a neighborhood where almost all residents belong to a single group (e.g. single ethnicity), an algorithm trained with the postal code could make decisions informed by group membership. Additionally, other groups that are located in the same geographic region (postal code) can risk misestimation if no other characteristic which can differentiate them is included.
Excluding individual’s protected attributes is not enough to guarantee equity of the decision and, moreover, they might be the key in algorithm analysis.
IS A MODEL EXPLANATION ENOUGH TO SPOT DISCRIMINATION?
Consider the case in which the bank6 omits some personal attributes that differentiate a specific individual/minority with low risk of default, e.g. good credit history. On the basis of this, they decide to use coarse characteristics that are not traceable to the individual, like postal code. In a neighborhood which historically and on average has more unpaid or defaulted loans compared to other neighborhoods, it is likely that minority will have their risk overestimated (represented by the postal code in a high-risk neighborhood). Moreover, this will not be perceptible if the data used is unavailable and the variables that can differentiate the individual/minority from the coarse group are not present in the data.
DO LARGE AMOUNTS OF DATA HELP GUARANTEE NON-DISCRIMINATION?
ML models are built on data that exhibits the biases in past decisions and therefore the data used for training greatly influences the outcome, there is no silver bullet and case by case analysis is required to avoid potential shortcoming.
Fairness is a complex issue that requires work on its definitions and the impact of such definitions on the algorithms produced. The absence of a one-box solution requires a great deal of analysis of the data used to train, the algorithms, and their impact. Fairness is highly connected to the level of interpretability of the algorithms and the transparency of the process with which those algorithms were created. This is currently an active field of research as practitioners question the implications of using ML algorithms to aid in the decision-making process.
 B. Goodman and Seth Flaxman. European union regulations on algorithmic decision-making and a “right to explanation”. https://arxiv.org/abs/1606.08813
 Sam Corbertt-Davies and Sharaf Goel. The measure and mismeasure of Fairness: A critical review of Fair Machine Learning. https://arxiv.org/abs/1808.00023
- This case is used in  to describe the individual’s rights under GDPR.
- General Data Protection Regulation (GDPR) 
- Article 21 of the Charter of Fundamental Rights 
- Multiple definitions of fairness and their shortcoming can be found in 
- The protected attributes of an individual are given in , namely: sex, race, color, ethnic or social origin,‘genetic features, language, religion or belief, political or any other opinion, membership of a national minority, property, birth, disability, age or sexual orientation.
- The actual intention of this hypothetical case is left out of the discussion, it could be in experience or actual malice to reduce the number of loans. This is irrelevant for the requirements imposed by “interpretability” of the decision.