Last week I wrote a blog post about how long it took to create machine learning models on Oracle Database Cloud service. There was some impressive results and some surprising results too.
I decided to try out the exact same tests, using the exact same data on the Oracle Autonomous Data Warehouse Cloud service (ADW).
When creating the ADW service I took the basic configuration and didn’t change anything. The inbuilt machine learning for the Autonomous service will magically workout my needs and make the necessary adjustments, Right? It can handle any data volume and any data processing requirements, Right?
Here are the results.
* You will notice that there is no time given for creating a SVM model for the 10M record data set. After waiting for 4 hours I got bored and gave up waiting (I actually did this three time to make sure it wasn’t a once off)
[I also had a 50M record data set. I just didn’t waste time trying that.]
[Neural Networks algorithm hasn’t been ported onto ADW at this point in time]
If you look back at the results from using the DBaaS you will see it was significantly quicker than the ADW. (for some it would be quicker using Python on my laptop)
Before you believe the hype, go test it yourself and make sure it measures up.
I re-ran my test cases over a number of days to see if the machine learning aspect of the Autonomous kicked in to learn from the processing and make any performance improvements. Sadly the results were basically the same or slightly slower. Disappointing.
When some tells you, you should be using this, ask them have they actually used and tested it themselves. And more importantly, don’t believe them. Go test it yourself.
Everyday someone talks about the the processing power needed for Machine Learning, and the vast computing needed for these tasks. It has become evident that most of these people have never created a machine learning model. Never. But like to make up stuff and try to make themselves look like an expert, or as I and others like to call them a “fake expert”.
When you question these “fake experts” about this topic, they huff and puff about lots of things and never answer the question or try to claim it is so difficult, you simply don’t understand.
Having worked in the area of machine learning for a very very long time, I’ve never really had performance issues with creating models. Yes most of the time I’ve been able to use my laptop. Yes my laptop to build models large models. In a couple of these my laptop couldn’t cope and I moved onto a server.
But over the past few years we keep hearing about using cloud services for machine learning. If you are doing machine learning you need to computing capabilities that are available with cloud services.
So, the results below show the results of building machine learning models, using different algorithms, with different sizes of data sets.
For this test, I used a basic cloud service. Well maybe it isn’t basic, but for others they will consider it very basic with very little compute involved.
I used an Oracle Cloud DBaaS for this experiment. I selected an Oracle 18c Extreme edition cloud service. This comes with the in-database machine learning option. This comes with 1 OCPUs, 7.5G Memory and 170GB storage. This is the basic configuration.
Next I created data sets with different sizes. These were based on one particular data set, as this ensures that as the data set size increases, the same kind of data and processing required remained consistent, instead of using completely different data sets.
The data set consisted of the following number of records, 72K, 660K, 210K, 2M, 10M and 50M.
I then created machine learning models using Decisions Tree, Naive Bayes, Support Vector Machine, Generaliszd Linear Models (GLM) and Neural Networks. Yes it was a typical classification problem.
The following table below shows the length of time in seconds to build the models. All data preparations etc was done prior to this.
Note: It should be noted that Automatic Data Preparation was turned on for these algorithms. This performed additional algorithm specific data preparation for each model. That means the times given in the following tables is for some data preparation time and for building the models.
Converting the above table into minutes.
The following outlines the steps to create a Autonomous Data Warehouse Cloud Service.
Log into your Oracle Cloud account and then follow these steps.
1. Select Autonomous Data Warehouse Cloud service from the side menu
2. Select Create Autonomous Data Warehouse button
3. Enter the Compartment details (Display Name, Database Name, CPU Core Count & Storage)
4. Enter a Password for Administrator, and then click ‘Create Autonomous Data Warehouse’
5. Wait until the ADWC is provisioned
Going from this
And you should receive and email that looks like this
6. Click on the name of the ADWS you created
7. Click on the Service Console button
8. Then click on Administration and then Download a Connection Wallet
Specify the password
You an now use this to connect to the ADWS using SQL Developer
One of the new features of the Autonomous Data Warehouse Cloud (ADWC) service is Oracle Machine Learning. This is a Zeppelin based notebook for your machine learning on ADWC. Check out my previous blog post about this.
In order to be able to use this new product and the in-database machine learning in ADWC, you will need your database user to have certain privileges. The first step in this is to create a typical user for accessing the ADWC and grant it the necessary OML privileges.
To do this open the ADWC console and then open the Service Console.
This will then open a new admin page which contains a link for ‘Manage Oracle ML User’. Click on this.
You can then enter the Username, Password and other details for the user, and then click Create.
This will then create a new user that is specific for Oracle Machine Learning. This new user will be granted the DWROLE, that contains the basic schema privileges and the privileges required to run the in-database machine learning algorithms. For those that a familiar with Oracle Data Mining/Oracle Advanced Analytics option in the Enterprise Edition of the Oracle database, you will see that these privileges are very similar.
You can examine the privileges granted to this DWROLE in the database as an administrator. When you do you will see the following:
CREATE ANALYTIC VIEW CREATE ATTRIBUTE DIMENSION ALTER SESSION CREATE HIERARCHY CREATE JOB CREATE MINING MODEL CREATE PROCEDURE CREATE SEQUENCE CREATE SESSION CREATE SYNONYM CREATE TABLE CREATE TRIGGER CREATE TYPE CREATE VIEW READ,WRITE ON directory DATA_PUMP_DIR EXECUTE privilege on the PL/SQL package DBMS_CLOUD