Machine Learning

Part 3 – Ensuring there is no Discrimination in the Data and machine learning models

Posted on Updated on

This is the third part of series of blog posts on ‘How the EU GDPR will affect the use of Machine Learning

The new EU GDPR has some new requirements that will affect what data can be used to ensure there is no discrimination. Additionally, the machine learning models needs to ensure that there is no discrimination with the predictions it will make. There is an underlying assumption that the organisation has the right to use the data about individuals and that this data has been legitimately obtained. The following outlines the areas relating to discrimination:

  • Discrimination based on unfair treatment of an individual based on using certain variables that may be inherently discriminatory. For example, race, gender, etc., and any decisions based on machine learning methods or not, that are based on an individual being part of one or more of these variables. This is particularly challenging for data scientists and it can limit some of the data points that can be included in their data sets.
  • All data mining models need to tested to ensure that there is no discrimination built into them. Although the data scientist has removed any obvious variables that may cause discrimination, the machine learning models may have been able to discover some bias or discrimination based on the patterns it has discovered in the data.
  • In the text preceding the EU GDPR (paragraph 71), details the requirements for data controllers to “implement appropriate technical and organizational measures” that “prevent, inter alia, discriminatory effects” based on sensitive data. Paragraph 71 and Article 22 paragraph 4 addresses discrimination based on profiling (using machine learning and other methods) that uses sensitive data. Care is needed to remove any associated correlated data.
  • If one group of people are under represented in a training data set then, depending on the type of prediction being used, may unknowingly discriminate this group when it comes to making a prediction. The training data sets will need to be carefully partitioned and separate machine learning models built on each partition to ensure that such discrimination does not occur.

NewImage

In the next blog post I will look at addressing the issues relating to Article 22 on the right to an explanation on outcomes automated individual decision-making, including profiling using machine learning and other methods.

Click back to ‘How the EU GDPR will affect the use of Machine Learning – Part 1‘ for links to all the blog posts in this series.

Part 2 – Do I have permissions to use the data for data profiling?

Posted on Updated on

This is the second part of series of blog posts on ‘How the EU GDPR will affect the use of Machine Learning

I have the data, so I can use it? Right?

I can do what I want with that data? Right? (sure the customer won’t know!)

NO. The answer is No you cannot use the data unless you have been given the permission to use it for a particular task.

The GDPR applies to all companies worldwide that process personal data of European Union (EU) citizens. This means that any company that works with information relating to EU citizens will have to comply with the requirements of the GDPR, making it the first global data protection law.

NewImage

The GDPR tightens the rules for obtaining valid consent to using personal information. Having the ability to prove valid consent for using personal information is likely to be one of the biggest challenges presented by the GDPR. Organisations need to ensure they use simple language when asking for consent to collect personal data, they need to be clear about how they will use the information, and they need to understand that silence or inactivity no longer constitutes consent.

NewImage

You will need to investigate the small print of all the terms and conditions that your customers have signed. Then you need to examine what data you have, how and where it was collected or generated, and then determine if I have to use this data beyond what the original intention was. If there has been no mention of using the customer data (or any part of it) for analytics, profiling, or anything vaguely related to it then you cannot use the data. This could mean that you cannot use any data for your analytics and/or machine learning. This is a major problem. No data means no analytics and no targeting the customers with special offers, etc.

NewImage

Data cannot be magically produced out of nowhere and it isn’t the fault of the data science team if they have no data to use.

How can you over come this major stumbling block?

The first place is to review all the T&Cs. Identify what data can be used and what data cannot be used. One approach for data that cannot be used is to update the T&Cs and get the customers to agree to them. Yes they need to explicitly agree (or not) to them. Giving them a time limit to respond is not allowed. It needs to be explicit.

NewImage

Yes this will be hard work. Yes this will take time. Yes it will affect what machine learning and analytics you can perform for some time. But the sooner you can identify these area, get the T&Cs updated, get the approval of the customers, the sooner the better and ideally all of this should be done way in advance on 25th May, 2018.

NewImage

In the next blog post I will look at addressing Discrimination in the data and in the machine learning models.

Click back to ‘How the EU GDPR will affect the use of Machine Learning – Part 1‘ for links to all the blog posts in this series.

How the EU GDPR will affect the use of Machine Learning – Part 1

Posted on

On 5 December 2015, the European Parliament, the Council and the Commission reached agreement on the new data protection rules, establishing a modern and harmonised data protection framework across the EU. Then on 14th April 2016 the Regulations and Directives were adopted by the European Parliament.

NewImage

The EU GDPR comes into effect on the 25th May, 2018.

Are you ready ?

The EU GDPR will affect every country around the World. As long as you capture and use/analyse data captured with the EU or by citizens in the EU then you have to comply with the EU GDPR.

Over the past few months we have seen a increase in the amount of blog posts, articles, presentations, conferences, seminars, etc being produced on how the EU GDPR will affect you. Basically if your company has not been working on implementing processes, procedures and ensuring they comply with the regulations then you a bit behind and a lot of work is ahead of you.

Like I said there was been a lot published and being talked about regarding the EU GDPR. Most of this is about the core aspects of the regulations on protecting and securing your data. But very little if anything is being discussed regarding the use of machine learning and customer profiling.

Do you use machine learning to profile, analyse and predict customers? Then the EU GDPRs affect you.

Article 22 of the EU GDPRs outlines some basic capabilities regarding machine learning, and in additionally Articles 13, 14, 19 and 21.

Over the coming weeks I will have the following blog posts. Each of these address a separate issue, within the EU GDPR, relating to the use of machine learning.

  • Part 2 – Do I have permissions to use the data for data profiling?
  • Part 3 – Ensuring there is no Discrimination in the Data and machine learning models.
  • Part 4 – (Article 22: Profiling) Why me? and how Oracle 12c saves the day

NewImage

Machine Learning notebooks (and Oracle)

Posted on Updated on

Over the past 12 months there has been an increase in the number of Machine Learning notebooks becoming available.

What is a Machine Learning notebook?

As the name implies it can be used to perform machine learning using one or more languages and allows you to organise your code, scripts and other details in one application.

The ML notebooks provide an interactive environment (sometimes browser based) that allows you to write, run, view results, share/collaborate code and results, visualise data, etc.

Some of these ML notebooks come with one language and others come with two or more languages, and have the ability to add other ML related languages. The most common languages are Spark, Phython and R.

Based on these languages ML notebooks are typically used in the big data world and on Hadoop.

NewImage

Examples of Machine Learning notebooks include: (Starting with the more common ones)

  • Apache Zeppelin
  • Jupyter Notebook (formally known as IPython Notebook)
  • Azure ML R Notebook
  • Beaker Notebook
  • SageMath

At Oracle Open World (2016), Oracle announced that they are currently working creating their own ML notebook and it is based on Apache Zeppelin. They seemed to indicate that a beta version might be available in 2017. Here are some photos from that presentation, but with all things that Oracle talk about you have to remember and take into account their Safe Habor.

2016 09 22 12 43 41

2016 09 22 12 45 53

2016 09 21 12 16 09

I’m looking forward to getting my hands on this new product when it is available.