Part 3 – Ensuring there is no Discrimination in the Data and machine learning models
This is the third part of series of blog posts on ‘How the EU GDPR will affect the use of Machine Learning‘
The new EU GDPR has some new requirements that will affect what data can be used to ensure there is no discrimination. Additionally, the machine learning models needs to ensure that there is no discrimination with the predictions it will make. There is an underlying assumption that the organisation has the right to use the data about individuals and that this data has been legitimately obtained. The following outlines the areas relating to discrimination:
- Discrimination based on unfair treatment of an individual based on using certain variables that may be inherently discriminatory. For example, race, gender, etc., and any decisions based on machine learning methods or not, that are based on an individual being part of one or more of these variables. This is particularly challenging for data scientists and it can limit some of the data points that can be included in their data sets.
- All data mining models need to tested to ensure that there is no discrimination built into them. Although the data scientist has removed any obvious variables that may cause discrimination, the machine learning models may have been able to discover some bias or discrimination based on the patterns it has discovered in the data.
- In the text preceding the EU GDPR (paragraph 71), details the requirements for data controllers to “implement appropriate technical and organizational measures” that “prevent, inter alia, discriminatory effects” based on sensitive data. Paragraph 71 and Article 22 paragraph 4 addresses discrimination based on profiling (using machine learning and other methods) that uses sensitive data. Care is needed to remove any associated correlated data.
- If one group of people are under represented in a training data set then, depending on the type of prediction being used, may unknowingly discriminate this group when it comes to making a prediction. The training data sets will need to be carefully partitioned and separate machine learning models built on each partition to ensure that such discrimination does not occur.
In the next blog post I will look at addressing the issues relating to Article 22 on the right to an explanation on outcomes automated individual decision-making, including profiling using machine learning and other methods.
Click back to ‘How the EU GDPR will affect the use of Machine Learning – Part 1‘ for links to all the blog posts in this series.