Month: January 2019

Machine Learning on Oracle Autonomous Data Warehouse

Posted on

Last week I wrote a blog post about how long it took to create machine learning models on Oracle Database Cloud service. There was some impressive results and some surprising results too.

I decided to try out the exact same tests, using the exact same data on the Oracle Autonomous Data Warehouse Cloud service (ADW).

When creating the ADW service I took the basic configuration and didn’t change anything. The inbuilt machine learning for the Autonomous service will magically workout my needs and make the necessary adjustments, Right? It can handle any data volume and any data processing requirements, Right?

Here are the results.

ml_adwc

* You will notice that there is no time given for creating a SVM model for the 10M record data set. After waiting for 4 hours I got bored and gave up waiting (I actually did this three time to make sure it wasn’t a once off)

[I also had a 50M record data set. I just didn’t waste time trying that.]

[Neural Networks algorithm hasn’t been ported onto ADW at this point in time]

If you look back at the results from using the DBaaS you will see it was significantly quicker than the ADW. (for some it would be quicker using Python on my laptop)

Before you believe the hype, go test it yourself and make sure it measures up.

I re-ran my test cases over a number of days to see if the machine learning aspect of the Autonomous kicked in to learn from the processing and make any performance improvements. Sadly the results were basically the same or slightly slower. Disappointing.

When some tells you, you should be using this, ask them have they actually used and tested it themselves. And more importantly, don’t believe them. Go test it yourself.

 

Advertisements

How long does it take to build a Machine Learning model using Oracle Cloud

Posted on Updated on

Everyday someone talks about the the processing power needed for Machine Learning, and the vast computing needed for these tasks. It has become evident that most of these people have never created a machine learning model. Never. But like to make up stuff and try to make themselves look like an expert, or as I and others like to call them a “fake expert”.

When you question these “fake experts” about this topic, they huff and puff about lots of things and never answer the question or try to claim it is so difficult, you simply don’t understand.

Having worked in the area of machine learning for a very very long time, I’ve never really had performance issues with creating models. Yes most of the time I’ve been able to use my laptop. Yes my laptop to build models large models. In a couple of these my laptop couldn’t cope and I moved onto a server.

But over the past few years we keep hearing about using cloud services for machine learning. If you are doing machine learning you need to computing capabilities that are available with cloud services.

So, the results below show the results of building machine learning models, using different algorithms, with different sizes of data sets.

For this test, I used a basic cloud service. Well maybe it isn’t basic, but for others they will consider it very basic with very little compute involved.

I used an Oracle Cloud DBaaS for this experiment. I selected an Oracle 18c Extreme edition cloud service. This comes with the in-database machine learning option. This comes with 1 OCPUs, 7.5G Memory and 170GB storage. This is the basic configuration.

Next I created data sets with different sizes. These were based on one particular data set, as this ensures that as the data set size increases, the same kind of data and processing required remained consistent, instead of using completely different data sets.

The data set consisted of the following number of records, 72K, 660K, 210K, 2M, 10M and 50M.

I then created machine learning models using Decisions Tree, Naive Bayes, Support Vector Machine, Generaliszd Linear Models (GLM) and Neural Networks. Yes it was a typical classification problem.

The following table below shows the length of time in seconds to build the models. All data preparations etc was done prior to this.

Note: It should be noted that Automatic Data Preparation was turned on for these algorithms. This performed additional algorithm specific data preparation for each model. That means the times given in the following tables is for some data preparation time and for building the models.

ml_on_dbaas_1

Converting the above table into minutes.

ml_on_dbaas_2

It is clear that the Neural Network model takes a lot longer to build than all the other algorithms. In this test the Neural Network model had only one hidden layer.
When we chart the build timings, leaving out Neural Networks, we get.
ml_on_dbaas_3 
We can see Naive Bayes, Decision Tree, GLM and SVM algorithms have very similar model build timings, but as the data volumes increase the Decision Tree algorithm become less efficient.
Overall it doesn’t take a long time to build models. In a way it is a very trivial task!
I mentioned at the start of this post I had created a data set of 50M records. Unfortunately I wasn’t able to get models build for this data set using this cloud instance. It used used so much TEMP tablespace that the file volumes on my cloud instance ran out of space!
I suppose if I wanted to go bigger with my data, I needed a bigger boat!
I haven’t included any timings for model scoring using these models. Why? the scored data is immediately returned event for large the largest data sets.

 

Changing the markers for Google Maps and centering map

Posted on

In some recent work, I’ve been integrating with Google Maps and some of the other Google API’s a lot. This post is just a reminder for myself on how to change the format, colour, and other properties of the map pointers.

cluster_0_gmap = gmaps.symbol_layer(
    map_locations_c0, fill_color='red',
    stroke_color='red', scale=5 )

cluster_1_gmap = gmaps.symbol_layer(
    map_locations_c1, fill_color='green',
    stroke_color='green', scale=5 )

cluster_2_gmap = gmaps.symbol_layer(
    map_locations_c2, fill_color='purple',
    stroke_color='purple', scale=5 )

cluster_3_gmap = gmaps.symbol_layer(
    map_locations_c3, fill_color='blue',
    stroke_color='blue', scale=5 )

And now for the map initial settings, centred on Athlone town in the middle of Ireland.

fig = gmaps.figure()

figure_layout = {
'width': '950px',
'height': '730px',
'border': '1px solid black',
'padding': '1px',
'margin': '0 auto 0 auto'
}

ireland_coord = (53.42, -7.94)
fig=gmaps.figure(center=ireland_coord, zoom_level=7.5, layout=figure_layout)

fig.add_layer(cluster_0_gmap)
fig.add_layer(cluster_1_gmap)
fig.add_layer(cluster_2_gmap)
fig.add_layer(cluster_3_gmap)
fig

 

Understanding, Building and Using Neural Network Models using Oracle 18c

Posted on Updated on

I recently had an article published on Oracle Developer Community website about Understanding, Building and Using Neural Network Machine Learning Models with Oracle 18c. I’ve also had a 2 Minute Tech Tip (2MTT) video about this topic and article. Oracle 18c Database brings prominent new machine learning algorithms, including Neural Networks and Random Forests. While many articles are available on machine learning, most of them concentrate on how to build a model. Very few talk about how to use these new algorithms in your applications to score or label new data. This article will explain how Neural Networks work, how to build a Neural Network in Oracle Database, and how to use the model to score or label new data. What are Neural Networks? Over the past couple of years, Neural Networks have attracted a lot of attention thanks to their ability to efficiently find patterns in data—traditional transactional data, as well as images, sound, streaming data, etc. But for some implementations, Neural Networks can require a lot of additional computing resources due to the complexity of the many hidden layers within the network. Figure 1 gives a very simple representation of a Neural Network with one hidden layer. All the inputs are connected to a neuron in the hidden layer (red circles). A neuron takes a set of numeric values as input and maps them to a single output value. (A neuron is a simple multi-input linear regression function, where the output is passed through an activation function.) Two common activation functions are logistic and tanh functions. There are many others, including logistic sigmoid function, arctan function, bipolar sigmoid function, etc. Continue reading the rest of the article here.