Machine Learning

Part 2 – Do I have permissions to use the data for data profiling?

Posted on Updated on

This is the second part of series of blog posts on ‘How the EU GDPR will affect the use of Machine Learning

I have the data, so I can use it? Right?

I can do what I want with that data? Right? (sure the customer won’t know!)

NO. The answer is No you cannot use the data unless you have been given the permission to use it for a particular task.

The GDPR applies to all companies worldwide that process personal data of European Union (EU) citizens. This means that any company that works with information relating to EU citizens will have to comply with the requirements of the GDPR, making it the first global data protection law.

NewImage

The GDPR tightens the rules for obtaining valid consent to using personal information. Having the ability to prove valid consent for using personal information is likely to be one of the biggest challenges presented by the GDPR. Organisations need to ensure they use simple language when asking for consent to collect personal data, they need to be clear about how they will use the information, and they need to understand that silence or inactivity no longer constitutes consent.

NewImage

You will need to investigate the small print of all the terms and conditions that your customers have signed. Then you need to examine what data you have, how and where it was collected or generated, and then determine if I have to use this data beyond what the original intention was. If there has been no mention of using the customer data (or any part of it) for analytics, profiling, or anything vaguely related to it then you cannot use the data. This could mean that you cannot use any data for your analytics and/or machine learning. This is a major problem. No data means no analytics and no targeting the customers with special offers, etc.

NewImage

Data cannot be magically produced out of nowhere and it isn’t the fault of the data science team if they have no data to use.

How can you over come this major stumbling block?

The first place is to review all the T&Cs. Identify what data can be used and what data cannot be used. One approach for data that cannot be used is to update the T&Cs and get the customers to agree to them. Yes they need to explicitly agree (or not) to them. Giving them a time limit to respond is not allowed. It needs to be explicit.

NewImage

Yes this will be hard work. Yes this will take time. Yes it will affect what machine learning and analytics you can perform for some time. But the sooner you can identify these area, get the T&Cs updated, get the approval of the customers, the sooner the better and ideally all of this should be done way in advance on 25th May, 2018.

NewImage

In the next blog post I will look at addressing Discrimination in the data and in the machine learning models.

Click back to ‘How the EU GDPR will affect the use of Machine Learning – Part 1‘ for links to all the blog posts in this series.

How the EU GDPR will affect the use of Machine Learning – Part 1

Posted on

On 5 December 2015, the European Parliament, the Council and the Commission reached agreement on the new data protection rules, establishing a modern and harmonised data protection framework across the EU. Then on 14th April 2016 the Regulations and Directives were adopted by the European Parliament.

NewImage

The EU GDPR comes into effect on the 25th May, 2018.

Are you ready ?

The EU GDPR will affect every country around the World. As long as you capture and use/analyse data captured with the EU or by citizens in the EU then you have to comply with the EU GDPR.

Over the past few months we have seen a increase in the amount of blog posts, articles, presentations, conferences, seminars, etc being produced on how the EU GDPR will affect you. Basically if your company has not been working on implementing processes, procedures and ensuring they comply with the regulations then you a bit behind and a lot of work is ahead of you.

Like I said there was been a lot published and being talked about regarding the EU GDPR. Most of this is about the core aspects of the regulations on protecting and securing your data. But very little if anything is being discussed regarding the use of machine learning and customer profiling.

Do you use machine learning to profile, analyse and predict customers? Then the EU GDPRs affect you.

Article 22 of the EU GDPRs outlines some basic capabilities regarding machine learning, and in additionally Articles 13, 14, 19 and 21.

Over the coming weeks I will have the following blog posts. Each of these address a separate issue, within the EU GDPR, relating to the use of machine learning.

  • Part 2 – Do I have permissions to use the data for data profiling?
  • Part 3 – Ensuring there is no Discrimination in the Data and machine learning models.
  • Part 4 – (Article 22: Profiling) Why me? and how Oracle 12c saves the day

NewImage

Machine Learning notebooks (and Oracle)

Posted on Updated on

Over the past 12 months there has been an increase in the number of Machine Learning notebooks becoming available.

What is a Machine Learning notebook?

As the name implies it can be used to perform machine learning using one or more languages and allows you to organise your code, scripts and other details in one application.

The ML notebooks provide an interactive environment (sometimes browser based) that allows you to write, run, view results, share/collaborate code and results, visualise data, etc.

Some of these ML notebooks come with one language and others come with two or more languages, and have the ability to add other ML related languages. The most common languages are Spark, Phython and R.

Based on these languages ML notebooks are typically used in the big data world and on Hadoop.

NewImage

Examples of Machine Learning notebooks include: (Starting with the more common ones)

  • Apache Zeppelin
  • Jupyter Notebook (formally known as IPython Notebook)
  • Azure ML R Notebook
  • Beaker Notebook
  • SageMath

At Oracle Open World (2016), Oracle announced that they are currently working creating their own ML notebook and it is based on Apache Zeppelin. They seemed to indicate that a beta version might be available in 2017. Here are some photos from that presentation, but with all things that Oracle talk about you have to remember and take into account their Safe Habor.

2016 09 22 12 43 41

2016 09 22 12 45 53

2016 09 21 12 16 09

I’m looking forward to getting my hands on this new product when it is available.