Predictive Analytics

Pre-build Machine Learning Models

Posted on November 9, 2020 Updated on October 27, 2020

Machine learning has seen widespread adoption over the past few years. In more recent times we have seem examples of how the models, created by the machine learning algorithms, can be shared. There have been various approaches to sharing these models using different model interchange languages. Some of these have become more or less popular over time, for example a few years ago PMML was very popular, and in more recent times ONNX seems to popular. Who knows what it will be next year or in a couple of years time.

With the increased use of machine learning models and the ability to share them, we are now seeing other uses of them. Typically the sharing of models involved a company transferring a model developed by the data scientists in their lab environment, to DevOps teams who then deploy the model into the production environment. This has developed a new are of expertise of MLOps or AIOps.

The languages and tools used by the data scientists in the lab environment are different to the languages used to deploy applications in production. The model interchange languages can be used take the model parameters, algorithm type and data transformations, etc and map these into the interchange language. The production environment would read this interchange object and apply it to the production language. In such situations the models will use the algorithms already coded in the production language. For example, the lab environment could be using Python. But the product environment could be using C, Java, Go, etc. Python is an interpretative language and in a lot of cases is not suitable for real-time use in a production environment, due to speed and scalability issues. In this case the underlying algorithm of the production language will be used and not algorithm used in the lab. In theory the algorithms should be the same. For example a decision tree algorithm using Gini Index in one language should function in the same way in another language. We all know there can be a small to a very large difference between what happens in theory and how it works in practice. Different language and different developers will do things slightly differently. This means there will be differences between the accuracy of the models developed in the lab versus the accuracy of the (same) model used in production. As long as everyone is aware of this, then everything will be ok. But it will be important task, for the data science team, to have some measurements of these differences.

Heres One I Made Earlier: 9780857835130: Amazon.com: Books

Moving on a little this a little, we are now seeing some other developments with the development and sharing of machine learning models, and the use of these open model interchange languages, like ONNX, makes this possible.

We are now seeing people making their machine learning models available to the wider community, instead of keeping them within their own team or organization.

Why would some one do this? why would they share their machine learning model? It’s a bit like the picture to the left which comes from a very popular kids programme on the BBC called Blue Peter. They would regularly show some craft projects for kids to work on at home. They would never show all the steps needed to finish the project and would end up showing us “one I made earlier”. It always looked perfect and nothing like what they tried to make in the studio and nothing like my attempt.

But having pre-made machine learning models is now a thing. There ware lots of examples of these and for example the ONNX website has several pre-trained models ready for you to use. These cover various examples for image classification, object detection, machine translation and comprehension, language modeling, speech and audio processing, etc. More are being added over time.

Most of these pre-trained models are based on defined data sets and problems and allows others to see what they have done, and start building upon their work without the need to go through the training and validating phase.

Could we have something like this in the commercial world? Could we have pre-trained machine learning models being standardized and shared across different organizations? Again the in-theory versus in-practical terms apply. Many organizations within a domain use the same or similar applications for capturing, storing, processing and analyzing their data. In this case could the sharing of machine learning models help everyone be more competitive or have better insights and discoveries from their data? Again the difference between in-theory versus in-practice applies.

Some might remember in the early days of Data Warehousing we used to have some industry (dimensional) models, and vendors and consulting companies would offer their custom developed industry models and how to populate these. In theory these were supposed to help companies to speed up their time to data insights and save money. We have seem similar attempts at doing similar things over the decades. But the reality was most projects ended up being way more expensive and took way too long to deploy due to lots of technical difficulties and lots of differences in the business understand, interpretation and deployment of the underlying applications. The pre-built DW model was generic and didn’t really fit in with the business needs.

Although we are seeing more and more pre-trained machine learning models appearing on the market. Many vendors are offering pre-trained solutions. But can these really work. Some of these pre-trained models are based on certain data preparation, using one particular machine learning model and using only one particular evaluation matric. As with the custom DW models of twenty years ago, pre-trained ML models are of limited use.

Everyone is different, data is different, behavior is different, etc. the list goes on. Using the principle of the “No Free Lunch” theorem, although we might be using the same or similar applications for capturing, storing, processing and analysing their data, the underlying behavior of the data (and the transactions, customers etc that influence that), will be different, the marketing campaigns will be different, business semantics may be different, general operating models will be different, etc. Based on “No Free Lunch” we need to explore the data using a variety of different algorithms, to determine what works for our data at this point in time. The behavior of the data (and business influences on it) keep on changing and evolving on a daily, weekly, monthly, etc basis. A great example of this but in a more extreme and rapid rate of change happened during the COVID pandemic. Most of the machine learning models developed over the preceding period no longer worked, the models developed during the pandemic have a very short life span, and it will take some time before “normal” will return and newer models can be built to represent the “new normal”

This entry was posted in data mining, Data Science, Machine Learning, Predictive Analytics and tagged data mining, Machine Learning, MLOps, Model Interchange, ONNX.

Presentations from OUGN17

Posted on March 13, 2017

Here are the presentations I gave at OUG Norway last week. These are also available on SlideShare

SQL: The one language to rule all your data from Brendan Tierney

Predictive analytics: Mining gold and creating valuable product from Brendan Tierney

This entry was posted in Predictive Analytics, SQL.

Cluster Distance using SQL with Oracle Data Mining – Part 4

Posted on July 5, 2016 Updated on June 16, 2016

This is the fourth and last blog post in a series that looks at how you can examine the details of predicted clusters using Oracle Data Mining. In the previous blog posts I looked at how to use CLUSER_ID, CLUSTER_PROBABILITY and CLUSTER_SET.

In this blog post we will look at CLUSTER_DISTANCE. We can use the function to determine how close a record is to the centroid of the cluster. Perhaps we can use this to determine what customers etc we might want to focus on most. The customers who are closest to the centroid are one we want to focus on first. So we can use it as a way to prioritise our workflows, particularly when it is used in combination with the value for CLUSTER_PROBABILITY.

Here is an example of using CLUSTER_DISTANCE to list all the records that belong to Cluster 14 and the results are ordered based on closeness to the centroid of this cluster.

SELECT customer_id, 
       cluster_probability(clus_km_1_37 USING *) as cluster_Prob,
       cluster_distance(clus_km_1_37 USING *) as cluster_Distance
FROM   insur_cust_ltv_sample
WHERE   cluster_id(clus_km_1_37 USING *) = 14
order by cluster_Distance asc;

Here is a subset of the results from this query.

NewImage

When you examine the results you may notice that the records that is listed first and closest record to the centre of cluster 14 has a very low probability. You need to remember that we are working in a N-dimensional space here. Although this first record is closest to the centre of cluster 14 it has a really low probability and if we examine this record in more detail we will find that it is at an overlapping point between a number of clusters.

This is why we need to use the CLUSTER_DISTANCE and CLUSTER_PROBABILITY functions together in our workflows and applications to determine how we need to process records like these.

This entry was posted in data mining, Oracle, Oracle Advanced Analytics, Predictive Analytics and tagged Clustering, data mining, Oracle Advanced Analytics.

Cluster Sets using SQL with Oracle Data Mining – Part 3

Posted on June 23, 2016 Updated on June 16, 2016

This is the third blog post on my series on examining the Clusters that were predicted by an Oracle Data Mining model. Check out the previous blog posts.

In the previous posts we were able to list the predicted cluster for each record in our data set. This is the cluster that the records belonged to the most. I also mentioned that a record could belong to many clusters.

So how can you list all the clusters that the a record belongs to?

You can use the CLUSTER_SET SQL function. This will list the Cluster Id and a probability measure for each cluster. This function returns a array consisting of the set of all clusters that the record belongs to.

The following example illustrates how to use the CLUSTER_SET function for a particular cluster model.

SELECT t.customer_id, s.cluster_id, s.probability
FROM   (select customer_id, cluster_set(clus_km_1_37 USING *) as Cluster_Set
        from   insur_cust_ltv_sample 
        WHERE  customer_id in ('CU13386', 'CU100')) T,
      TABLE(T.cluster_set) S
order by t.customer_id, s.probability desc;

The output from this query will be an ordered data set based on the customer id and then the clusters listed in descending order of probability. The cluster with the highest probability is what would be returned by the CLUSTER_ID function. The output from the above query is shown below.

NewImage

If you would like to see the details of each of the clusters and to examine the differences between these clusters then you will need to use the CLUSTER_DETAILS function (see previous blog post).

You can specify topN and cutoff to limit the number of clusters returned by the function. By default, both topN and cutoff are null and all clusters are returned.

– topN is the N most probable clusters. If multiple clusters share the Nth probability, then the function chooses one of them.

– cutoff is a probability threshold. Only clusters with probability greater than or equal to cutoff are returned. To filter by cutoff only, specify NULL for topN.

You may want to use these individually or combined together if you have a large number of customers. To return up to the N most probable clusters that are greater than or equal to cutoff, specify both topN and cutoff.

The following example illustrates using the topN value to return the top 4 clusters.

SELECT t.customer_id, s.cluster_id, s.probability
FROM   (select customer_id, cluster_set(clus_km_1_37, 4, null USING *) as Cluster_Set
        from   insur_cust_ltv_sample 
        WHERE  customer_id in ('CU13386', 'CU100')) T,
      TABLE(T.cluster_set) S
order by t.customer_id, s.probability desc;

and the output from this query shows only 4 clusters displayed for each record.

NewImage

Alternatively you can select the clusters based on a cut off value for the probability. In the following example this is set to 0.05.

SELECT t.customer_id, s.cluster_id, s.probability
FROM   (select customer_id, cluster_set(clus_km_1_37, NULL, 0.05 USING *) as Cluster_Set
        from   insur_cust_ltv_sample 
        WHERE  customer_id in ('CU13386', 'CU100')) T,
      TABLE(T.cluster_set) S
order by t.customer_id, s.probability desc;

and the output this time looks a bit different.

NewImage

Finally, yes you can combine these two parameters to work together.

SELECT t.customer_id, s.cluster_id, s.probability
FROM (select customer_id, cluster_set(clus_km_1_37, 2, 0.05 USING *) as Cluster_Set
from insur_cust_ltv_sample
WHERE customer_id in (‘CU13386’, ‘CU100’)) T,
TABLE(T.cluster_set) S
order by t.customer_id, s.probability desc;

This entry was posted in data mining, Oracle, Oracle Advanced Analytics, Predictive Analytics and tagged Clustering, data mining, Oracle Advanced Analytics, Predictive Analytics.

Examining predicted Clusters and Cluster details using SQL

Posted on June 7, 2016

In a previous blog post I gave some details of how you can examine some of the details behind a prediction made using a classification model. This seemed to spark a lot of interest. But before I come back to looking at classification prediction details and other information, this blog post is the first in a 4 part blog post on examining the details of Clusters, as identified by a cluster model created using Oracle Data Mining.

The 4 blog posts will consist of:

1 – (this blog post) will look at how to determine the predicted cluster and cluster probability for your record.
2 – will show you how to examine the details behind and used to predict the cluster.
3 – A record could belong to many clusters. In this blog post we will look at how you can determine what clusters a record can belong to.
4 – Cluster distance is a measure of how far the record is from the cluster centroid. As a data point or record can belong to many clusters, it can be useful to know the distances as you can build logic to perform different actions based on the cluster distances and cluster probabilities.

Right. Let’s have a look at the first set of these closer functions. These are CLUSTER_ID and CLUSTER_PROBABILITY.

CLUSER_ID : Returns the number of the cluster that the record most closely belongs to. This is measured by the cluster distance to the centroid of the cluster. A data point or record can belong or be part of many clusters. So the CLUSTER_ID is the cluster number that the data point or record most closely belongs too.

CLUSTER_PROBABILITY : Is a probability measure of the likelihood of the data point or record belongs to a cluster. The cluster with the highest probability score is the cluster that is returned by the CLUSTER_ID function.

Now let us have a quick look at the SQL for these two functions. This first query returns the cluster number that each record most strong belongs too.

SELECT customer_id, 
       cluster_id(clus_km_1_37 USING *) as Cluster_Id, 
FROM   insur_cust_ltv_sample
WHERE  customer_id in ('CU13386', 'CU6607', 'CU100');

NewImage

Now let us add in the cluster probability function.

SELECT customer_id, 
       cluster_id(clus_km_1_37 USING *) as Cluster_Id,
       cluster_probability(clus_km_1_37 USING *) as cluster_Prob       
FROM   insur_cust_ltv_sample
WHERE  customer_id in ('CU13386', 'CU6607', 'CU100');

NewImage

These functions gives us some insights into what the cluster predictive model is doing. In the remaining blog posts in this series I will look at how you can delve deeper into the predictions that the cluster algorithm is make.

This entry was posted in Oracle, Oracle Advanced Analytics, Predictive Analytics and tagged Clustering, Oracle, Oracle Advanced Analytics.

Installing RStudio Server on an (Oracle) Linux server

Posted on March 17, 2016

In a previous blog post I showed how you can install and get started with using RStudio on a server by using RStudio Server. My previous post showed how you could do that on the Oracle BigDataLite VM. On this VM everything was nicely scripted and set up for you. But when it comes to installing it on a different server, well things can be a bit different.

The purpose of this blog post is to go through the install steps you need to follow on your own server or Oracle Database server. The following is based on a server that is setup with Oracle Linux. (I’m actually using the Oracle DB Developer VM).

1. Download the latest version of RStudio Server.

Use the following link to download RStudio Server. But do a quick check on the RStudio server to get the current version number.

wget https://download2.rstudio.org/rstudio-server-rhel-0.99.892-x86_64.rpm

The following shows you what you will see when you run this command.

--2016-03-16 06:22:30--  https://download2.rstudio.org/rstudio-server-rhel-0.99.892-x86_64.rpm
Resolving download2.rstudio.org (download2.rstudio.org)... 54.192.28.107, 54.192.28.54, 54.192.28.12, ...
Connecting to download2.rstudio.org (download2.rstudio.org)|54.192.28.107|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 38814908 (37M) [application/x-redhat-package-manager]
Saving to: ‘rstudio-server-rhel-0.99.892-x86_64.rpm’

100%[============================================================>] 38,814,908  6.54MB/s   in 6.0s  

2016-03-16 06:22:37 (6.17 MB/s) - ‘rstudio-server-rhel-0.99.892-x86_64.rpm’ saved [38814908/38814908]

2. Install RStudio Server

sudo yum install --nogpgcheck rstudio-server-rhel-0.99.892-x86_64.rpm

when prompted if it is OK to install, enter y (highlighted in bold below)

Loaded plugins: langpacks
Examining rstudio-server-rhel-0.99.892-x86_64.rpm: rstudio-server-0.99.892-1.x86_64
Marking rstudio-server-rhel-0.99.892-x86_64.rpm to be installed
Resolving Dependencies
--> Running transaction check
---> Package rstudio-server.x86_64 0:0.99.892-1 will be installed
--> Finished Dependency Resolution
ol7_UEKR3/x86_64                                                                    | 1.2 kB  00:00:00    
ol7_addons/x86_64                                                                   | 1.2 kB  00:00:00    
ol7_latest/x86_64                                                                   | 1.4 kB  00:00:00    
ol7_optional_latest/x86_64                                                          | 1.2 kB  00:00:00    

Dependencies Resolved

===========================================================================================================
 Package               Arch          Version             Repository                                   Size
===========================================================================================================
Installing:
 rstudio-server        x86_64        0.99.892-1          /rstudio-server-rhel-0.99.892-x86_64        280 M

Transaction Summary
===========================================================================================================
Install  1 Package

Total size: 280 M
Installed size: 280 M

Is this ok [y/d/N]: y

Downloading packages:
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : rstudio-server-0.99.892-1.x86_64                                                        1/1
groupadd: group 'rstudio-server' already exists
rsession: no process found
ln -s '/etc/systemd/system/rstudio-server.service' '/etc/systemd/system/multi-user.target.wants/rstudio-server.service'
rstudio-server.service - RStudio Server
   Loaded: loaded (/etc/systemd/system/rstudio-server.service; enabled)
   Active: active (running) since Wed 2016-03-16 10:46:00 PDT; 1s ago
  Process: 3191 ExecStart=/usr/lib/rstudio-server/bin/rserver (code=exited, status=0/SUCCESS)
 Main PID: 3192 (rserver)
   CGroup: /system.slice/rstudio-server.service
           ├─3192 /usr/lib/rstudio-server/bin/rserver
           └─3205 /usr/lib64/R/bin/exec/R --slave --vanilla -e cat(R.Version()$major,R.Version()$minor,~+~sep=".")

Mar 16 10:46:00 localhost.localdomain systemd[1]: Started RStudio Server.
  Verifying  : rstudio-server-0.99.892-1.x86_64                                                        1/1

Installed:
  rstudio-server.x86_64 0:0.99.892-1                                                                      

Complete!

3. Open RStudio using a web browser.

Open your favourite web browser and put in the host name or the IP address of your server. In my example I’m using the Oracle DB Developer VM to demonstrate the install, so I can use localhost, followed by the port number for RStudio Server.

NewImage

4. Use and Enjoy

If you get logged into RStudio Server then you will see a screen something like the following!

Job Done and Enjoy!

5. An Extra Step is using the Oracle DB Developer VM

If you want to use RStudio on the Oracle DB Developer VM from your local OS, then you will need to open the port 8787 on the VM. To do this power down the VM, if you have it open. The open the Network section of the VM settings. I’m using VirtualBox. And then click on the Port Forwarding.

NewImage

Click on OK to save your Port Forwarding setting and then click on the OK button again to close the Network settings for the VM.

Now start up the VM. When it has loaded and you have the desktop displayed in the VM window, you should now be able to connect to RStudio in the VM, from your local machine.

To do this open your web browser on your local machine and type in

http://localhost:8787

You should now get the RStudio login in screen that is shown in point 3 above. Go ahead, login and enjoy.

6. A little warning

Make sure to log out of RStudio when you are finished using it. If you don’t then your R environment may not have been saved and you will get a message when you log in next. Now we don’t want that happenings, so just log out of RStudio. You can do that by looking at the top right hand corner of the RStudio Server application.

I will have one more blog post on how you can configure RStudion Server to work with an Oracle Database server that has Oracle R Enterprise installed.

This entry was posted in Oracle R Enterprise, Predictive Analytics, R, VirtualBox and tagged Big Data, R, RStudio, VirtualBox.

Installing RStudio Server on Oracle BigDataLite VM

Posted on March 14, 2016 Updated on March 12, 2016

A very popular tool for data scientists is RStudio. This tool allows you to interactively work with your R code, view the R console, the graphs and charts you create, manage the various objects and data frames you create, as well shaving easy access to the R help documentation. Basically it is a core everyday tool.

The typical approach is to have RStudio installed on your desktop or laptop. What this really means is that the data is pulled to your desktop or laptop and all analytics is performed there. In most cases this is fine but as your data volumes goes does does the limitations of using R on your local machine.

An alternative is to install a version called RStudio Server on an analytics server or on the database server. You can now use the computing capabilities of this server to overcome some of the limitations of using R or RStudio locally. Now you will use your web browser to access RStudio Server on your database server.

In this blog post I will walk you through how to install and get connected to RStudio Server on the Oracle BigDataLite VM.

NewImage

After starting up the Oracle BigDataLite VM and logging into the Oracle user (password=welcome1) you will see the Start Here icon on the desktop. You will need to double click on this.

NewImage

This will open a webpage on the VM that contains details of all the various tools that are installed on the VM or are ready for you to install and configure. This information contains all the http addresses and ports you need to access each of these tools via a web browser or some other way, along with the usernames and passwords you need to use them.

NewImage

One of the tools lists is for RStudio Server. This product is not installed on the VM but Oracle has provided a script that you can run to perform the install in an automated way. This script is located in:

[oracle@bigdatalite ~]$ cd  /home/oracle/scripts/

Use the following command to run the RStudio Server install script.

[oracle@bigdatalite scripts]$ ./install_rstudio.sh

The following is the output from running this script and it will be displayed in your terminal window. You can use this to monitor the progress of the installation.

Retrieving RStudio
--2016-03-12 02:06:15--  https://download2.rstudio.org/rstudio-server-rhel-0.99.489-x86_64.rpm
Resolving download2.rstudio.org... 54.192.28.12, 54.192.28.54, 54.192.28.98, ...
Connecting to download2.rstudio.org|54.192.28.12|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 34993428 (33M) [application/x-redhat-package-manager]
Saving to: `rstudio-server-rhel-0.99.489-x86_64.rpm'

100%[======================================>] 34,993,428  5.24M/s   in 10s     

2016-03-12 02:06:26 (3.35 MB/s) - `rstudio-server-rhel-0.99.489-x86_64.rpm' saved [34993428/34993428]

Installing RStudio
Loaded plugins: refresh-packagekit, security, ulninfo
Setting up Install Process
Examining rstudio-server-rhel-0.99.489-x86_64.rpm: rstudio-server-0.99.489-1.x86_64
Marking rstudio-server-rhel-0.99.489-x86_64.rpm to be installed
public_ol6_UEKR3_latest                                  | 1.2 kB     00:00     
public_ol6_UEKR3_latest/primary                          |  22 MB     00:03     
public_ol6_UEKR3_latest                                                 568/568
public_ol6_latest                                        | 1.4 kB     00:00     
public_ol6_latest/primary                                |  55 MB     00:12     
public_ol6_latest                                                   33328/33328
Resolving Dependencies
--> Running transaction check
---> Package rstudio-server.x86_64 0:0.99.489-1 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

================================================================================
 Package        Arch   Version       Repository                            Size
================================================================================
Installing:
 rstudio-server x86_64 0.99.489-1    /rstudio-server-rhel-0.99.489-x86_64 251 M

Transaction Summary
================================================================================
Install       1 Package(s)

Total size: 251 M
Installed size: 251 M
Downloading Packages:
Running rpm_check_debug
Running Transaction Test
Transaction Test Succeeded
Running Transaction
  Installing : rstudio-server-0.99.489-1.x86_64                             1/1 
useradd: user 'rstudio-server' already exists
groupadd: group 'rstudio-server' already exists
rsession: no process killed
rstudio-server start/running, process 5037
  Verifying  : rstudio-server-0.99.489-1.x86_64                             1/1 

Installed:
  rstudio-server.x86_64 0:0.99.489-1                                            

Complete!
Restarting RStudio
rstudio-server stop/waiting
rsession: no process killed
rstudio-server start/running, process 5066

When the installation is finished you are now ready to connect to the RStudio Server. So open your web browser and enter the following into the address bar.

http://localhost:8787/

NewImage

The initial screen you are presented with is a login screen. Enter your Linux username and password. In the case of the BigDataLite VM this will be oracle/welcome1.

NewImage

Then you will be presented with the RStudio Server application in your web browser, as shown below. As you can see it is very similar to using RStudio on your desktop. Happy Days! You are now setup and able to run RStudio on the database server.

NewImage

Make sure to log out of RStudio Server before closing down the window.

NewImage

If you don’t log out of RStudio Server then the next time you open RStudio Server your session will automatically open. Perhaps this is not the best for security, so try to remember to log out each time.

By now using RStudio Server on the Oracle Database server I can not get some of the benefits of computing capabilities of this server. Although there are still the typical limitations with of using R. But now I access RStudio on the database server and process the data on the database server, all from my local PC or laptop.

Everything is nicely setup and ready for you to install on the BigDataLite VM (thank you Oracle). But what about when we want to install RStudion Server on a different server. What are the steps necessary to install, configure and log in. Yes they should be similar but I will give a complete list of steps in my next blog post.

This entry was posted in Oracle R Enterprise, Predictive Analytics, R, VirtualBox and tagged BigDataLite, Oracle Data Science, R, RStudio, VirtualBox.

Slide from my OOW15 Presentations

Posted on November 5, 2015 Updated on November 5, 2015

At Oracle Open World (OOW15) I gave 2 presentations on the Sunday during the Oracle User Group Forum. The slides are now available for download from the Oracle Open World website.

Go get them now!

More Than Another 12 on Oracle Database 12c [UGF3190]

During this sessions I was one of 16 presenters talking about various features in the Oracle Database. All of the presenters where from the EOUC region.

Real Business Value from Big Data and Advanced Analytics [UGF4519]

I co-presented with Antony Heljula from Peak Indicators. During this presentation we talked about some of the Advanced Analytics projects we have worked on over the past 18-24 months. We also announced a new Analytics-as-a-Service offering.

The slides are also available for most of the other Oracle Open World Presentations and these can be accessed here. Just go search for the topic you are interested in.

Check out my previous blog post that summarises just a small part of what I got up to at OOW15.

This entry was posted in Oracle Advanced Analytics, ORE, Predictive Analytics.

Slide from my OOW15 Presentations

Posted on November 5, 2015

At Oracle Open World (OOW15) I gave 2 presentations on the Sunday during the Oracle User Group Forum. The slides are now available for download from the Oracle Open World website.

Go get them now!

More Than Another 12 on Oracle Database 12c [UGF3190]

During this sessions I was one of 16 presenters talking about various features in the Oracle Database. All of the presenters where from the EOUC region.

Real Business Value from Big Data and Advanced Analytics [UGF4519]

The slides are also available for most of the other Oracle Open World Presentations and these can be accessed here. Just go search for the topic you are interested in.

This entry was posted in Oracle Advanced Analytics, ORE, Predictive Analytics.

Automatic Analytics is So main stream. Not something new.

Posted on March 12, 2015

Everyone is doing advanced analytics. Right? Hmm

Everyone is talking about advanced analytics? Yes that is true.

Everyone is an expert in advanced analytics? This is so not true. Watch out for these Great Pretenders. You know what I mean! You know who I mean! Maybe you know some of them already? If not, watch out for these Great Pretenders!!!

Some people are going around talking about data mining, predictive analytics, advanced analytics, machine learning etc as if this is some new topic. Well it isn’t. It isn’t anything new and most of the techniques have been about for 10, 20, 30+ years.

Some people are saying you should only use language X or tool Y because. Everything else is basically rubbish.

What we do have is a wider understanding of how to use these techniques on our various data sources.

What we have is a lot more tools that allow us to perform these tasks a lot easier, at greater speed, with more functionality and without the need to fully understand the hard core maths that is going on behind the scenes.

What we have is a lot more languages to perform these tasks and to support the vast amount of work that goes into understanding the data and preparing the data.

Someone thing for all of us to watch out for, when we ready about these topics, is what kind of problem area they are addressing. The following table illustrates the three main types or categories of Analytics. These categories are Descriptive Analytics, Predictive Analytics and Prescriptive Analytics. I think most people would agree that the Descriptive and Predictive Analytics categories are very mature at this stage. With Predictive Analytics we are perhaps still evolving in this category and a lot more work needs to be done before this this become wide spread.

Blog 1

Some people talk as if Predictive Analytics is some new and exciting topic. But isn’t all that new. It was been around for the past 30+ years. If you go back over the Gartner Hype Cycle that comes out every September, Predictive Analytics is no longer being shown on this graph. The last time it appeared on the Gartner Hype Cycle was back in 2013 and it was positioned on the far right of the graph in the section called Plateau of Productivity.

So Predictive Analytics is very mature and main stream. Part of the reason that it is main stream is that Predictive Analytics has allowed for a new category of Analytics to evolve and this is Automatic Analytics.

Automatic Analytics is where Advanced and Predictive Analytics has been build into our day to day applications that are used to run our business. We do not need the hard core type of data scientists to perform various analytic on our data. Instead these task, once they have been defined, can then be added to our applications to process, evaluate and make decisions all automatically. This is were we need the data scientists to be able to communicate with the business and be able to work with them to solve real world business projects. This is a different type of data scientist to the “hard” core data scientist who delves into the various statistical methods, machine learning methods, data management methods, etc.

The following table extends the table given above to include Automatic Analytics, and is my own take on how and where Automatic Analytics fits.

Blog 2

Every time we get an insurance quote, health insurance quote, get a “random” call from our Telco offering a free upgrade, get our loyalty card statements, get a loan from the bank, look at or buy a book on Amazon, etc. the list could go on and on, but these are all examples of how predictive analytics has been automated into our everyday business application.

But this is nothing new. When I first got into data mining/predictive analytics over 16 years ago, it was considered a common thing that certain types of companies did. What has happened in the time since and particularly in the past few years is that a lot more people are seeing the value in using it.

Before I finish off this post we can have a quick look at what Oracle has been doing in this area. They have their Advanced Analytics Option and Real-Time Decisions tools to all data scientists do their magic. But over the past X years (nobody can give me an exact number) they have been very, very active in building in lots and lots of predictive analytics into their various business applications, particularly with into with Fusion Apps and BI Apps.

Blog 3

A recent quote from Oracle highlights their aim with this,

” … products designed to close the gap between data scientists and businesses.“

Now with Oracle making a big push to the cloud, they are busy adding in more and more Automatic (Predictive) Analytics into their Cloud Applications. What we need from Oracle is a clearer identification of where they have done this. Plus with the migration of their Apps to the cloud, their Advanced Analytics Option is a core part of their Cloud platform. As they upgrade or add new features into their Cloud Apps, you will now be able to get the benefit of these Automatic (Predictive) Analytics as they come available.

Blog 5

This entry was posted in Automatic Analytics, data mining, Data Science, Data Scientist, Predictive Analytics.

Ora-lytics

By Brendan Tierney

Predictive Analytics

Pre-build Machine Learning Models

Presentations from OUGN17

Cluster Distance using SQL with Oracle Data Mining – Part 4

Cluster Sets using SQL with Oracle Data Mining – Part 3

Examining predicted Clusters and Cluster details using SQL

Installing RStudio Server on an (Oracle) Linux server

Installing RStudio Server on Oracle BigDataLite VM

Slide from my OOW15 Presentations

Slide from my OOW15 Presentations

Automatic Analytics is So main stream. Not something new.

Predictive Analytics

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: