There are a number of different ways to load data into your Autonomous Data Warehouse (ADW) environment. I’ll have posts about these alternatives.
In this blog post I’ll go through the steps needed to load data using Object Storage. This might appear to have a large-ish number of steps, but once you have gone through it and have some of the parts already setup and configuration from your first time, then the second and subsequent times will be easier.
After logging into your Oracle Cloud dashboard, select Object Storage from the side menu.
Then click on the Create Bucket button.
Enter a name for the Object Storage bucket, take the defaults for the for the rest, and click on the Create Bucket button at the bottom. In my example, I’ve called the bucket ‘ADW_Bucket’.
Click on the name of the bucket in the list.
And then click Upload Objects button.
In the Upload Objects window, browse for the file(s) you want to upload.
Then click on the Upload Objects button on the Upload Objects window. After a few moments you will see a message saying the file(s) have been uploaded. Click on the Close window.
Click into the Object details and take a note/copy of the URL Path. You will need this later
To load data from the Oracle Cloud Infrastructure(OCI) Object Storage you will need an OCI user with the appropriate privileges to read data (or upload) data to the Object Store. The communication between the database and the object store relies on the Swift protocol and the OCI user Auth Token. Go back to the menu in the upper left and select users.
Then click on the user name to view the details. This is probably your OCI username.
On the left hand side of the page click Auth Tokens, and then click on Generate Token button. Give a name for the token e.g ADW_TOKEN, and then generate token.
Save the generated token to use later.
Open SQL Developer and setup a connection to your OML User/schema. When connected the next steps is to authenticate with the Object storage using your OCI username and the Auth Token, generated above.
BEGIN DBMS_CLOUD.CREATE_CREDENTIAL( credential_name => 'ADW_TOKEN', username => '<your cloud username>', password => '<generated auth token>' ); END;
If successful you should get the following message. If not then you probably entered something incorrectly. Go back and review the previous steps
PL/SQL procedure successfully completed.
Next, create a table to store the data you want to import. For my table the create table is the following. [It is one of the sample data sets for OML, and I’ve made the create table statement compact to save space in this post]
create table credit_scoring_100k ( customer_id number(38,0), age number(4,0), income number(38,0), marital_status varchar2(26 byte), number_of_liables number(3,0), wealth varchar2(4000 byte), education_level varchar2(26 byte), tenure number(4,0), loan_type varchar2(26 byte), loan_amount number(38,0), loan_length number(5,0), gender varchar2(26 byte), region varchar2(26 byte), current_address_duration number(5,0), residental_status varchar2(26 byte), number_of_prior_loans number(3,0), number_of_current_accounts number(3,0), number_of_saving_accounts number(3,0), occupation varchar2(26 byte), has_checking_account varchar2(26 byte), credit_history varchar2(26 byte), present_employment_since varchar2(26 byte), fixed_income_rate number(4,1), debtor_guarantors varchar2(26 byte), has_own_phone_no varchar2(26 byte), has_same_phone_no_since number(4,0), is_foreign_worker varchar2(26 byte), number_of_open_accounts number(3,0), number_of_closed_accounts number(3,0), number_of_inactive_accounts number(3,0), number_of_inquiries number(3,0), highest_credit_card_limit number(7,0), credit_card_utilization_rate number(4,1), delinquency_status varchar2(26 byte), new_bankruptcy varchar2(26 byte), number_of_collections number(3,0), max_cc_spent_amount number(7,0), max_cc_spent_amount_prev number(7,0), has_collateral varchar2(26 byte), family_size number(3,0), city_size varchar2(26 byte), fathers_job varchar2(26 byte), mothers_job varchar2(26 byte), most_spending_type varchar2(26 byte), second_most_spending_type varchar2(26 byte), third_most_spending_type varchar2(26 byte), school_friends_percentage number(3,1), job_friends_percentage number(3,1), number_of_protestor_likes number(4,0), no_of_protestor_comments number(3,0), no_of_linkedin_contacts number(5,0), average_job_changing_period number(4,0), no_of_debtors_on_fb number(3,0), no_of_recruiters_on_linkedin number(4,0), no_of_total_endorsements number(4,0), no_of_followers_on_twitter number(5,0), mode_job_of_contacts varchar2(26 byte), average_no_of_retweets number(4,0), facebook_influence_score number(3,1), percentage_phd_on_linkedin number(4,0), percentage_masters number(4,0), percentage_ug number(4,0), percentage_high_school number(4,0), percentage_other number(4,0), is_posted_sth_within_a_month varchar2(26 byte), most_popular_post_category varchar2(26 byte), interest_rate number(4,1), earnings number(4,1), unemployment_index number(5,1), production_index number(6,1), housing_index number(7,2), consumer_confidence_index number(4,2), inflation_rate number(5,2), customer_value_segment varchar2(26 byte), customer_dmg_segment varchar2(26 byte), customer_lifetime_value number(8,0), churn_rate_of_cc1 number(4,1), churn_rate_of_cc2 number(4,1), churn_rate_of_ccn number(5,2), churn_rate_of_account_no1 number(4,1), churn_rate__of_account_no2 number(4,1), churn_rate_of_account_non number(4,2), health_score number(3,0), customer_depth number(3,0), lifecycle_stage number(38,0), credit_score_bin varchar2(100 byte));
After creating the table, you are ready to import the data from Object storage. To do this you will need to use the DBMS_COULD PL/SQL package.
begin dbms_cloud.copy_data( table_name =>'credit_scoring_100k', credential_name =>'ADW_TOKEN', file_uri_list => '<url of file in your Object Store bucket, see comment earlier in post>', format => json_object('ignoremissingcolumns' value 'true', 'removequotes' value 'true', 'dateformat' value 'YYYY-MM-DD HH24:MI:SS', 'blankasnull' value 'true', 'delimiter' value ',', 'skipheaders' value '1') ); end;
You can now query the data and use with Oracle Machine Learning, etc.
[I said at the top of the post there are other methods available. More on this in other posts]
Oracle Autonomous Database (ADW) has been out a while now and have had several, behind the scenes, improvements and new/additional features added.
If you have used the Oracle Machine Learning (OML) component of ADW you will have seen the various sample OML Notebooks that come pre-loaded. These are easy to open, use and to try out the various OML features.
The above image shows the top part of the login screen for OML. To see the available sample notebooks click on the Examples icon. When you do, you will get the following sample OML Notebooks.
But what if you have a notebook you have used elsewhere. These can be exported in json format and loaded as a new notebook in OML.
To load a new notebook into OML, select the icon (three horizontal line) on the top left hand corner of the screen. Then select Notebooks from the menu.
Then select the Import button located at the top of the Notebooks screen. This will open a File window, where you can select the json file from your file system.
A couple of seconds later the notebook will be available and listed along side any other notebooks you may have created.
You have now imported a new notebook into OML and can now use it to process your data and perform machine learning using the in-database features.
Last week I wrote a blog post about how long it took to create machine learning models on Oracle Database Cloud service. There was some impressive results and some surprising results too.
I decided to try out the exact same tests, using the exact same data on the Oracle Autonomous Data Warehouse Cloud service (ADW).
When creating the ADW service I took the basic configuration and didn’t change anything. The inbuilt machine learning for the Autonomous service will magically workout my needs and make the necessary adjustments, Right? It can handle any data volume and any data processing requirements, Right?
Here are the results.
* You will notice that there is no time given for creating a SVM model for the 10M record data set. After waiting for 4 hours I got bored and gave up waiting (I actually did this three time to make sure it wasn’t a once off)
[I also had a 50M record data set. I just didn’t waste time trying that.]
[Neural Networks algorithm hasn’t been ported onto ADW at this point in time]
If you look back at the results from using the DBaaS you will see it was significantly quicker than the ADW. (for some it would be quicker using Python on my laptop)
Before you believe the hype, go test it yourself and make sure it measures up.
I re-ran my test cases over a number of days to see if the machine learning aspect of the Autonomous kicked in to learn from the processing and make any performance improvements. Sadly the results were basically the same or slightly slower. Disappointing.
When some tells you, you should be using this, ask them have they actually used and tested it themselves. And more importantly, don’t believe them. Go test it yourself.
Everyday someone talks about the the processing power needed for Machine Learning, and the vast computing needed for these tasks. It has become evident that most of these people have never created a machine learning model. Never. But like to make up stuff and try to make themselves look like an expert, or as I and others like to call them a “fake expert”.
When you question these “fake experts” about this topic, they huff and puff about lots of things and never answer the question or try to claim it is so difficult, you simply don’t understand.
Having worked in the area of machine learning for a very very long time, I’ve never really had performance issues with creating models. Yes most of the time I’ve been able to use my laptop. Yes my laptop to build models large models. In a couple of these my laptop couldn’t cope and I moved onto a server.
But over the past few years we keep hearing about using cloud services for machine learning. If you are doing machine learning you need to computing capabilities that are available with cloud services.
So, the results below show the results of building machine learning models, using different algorithms, with different sizes of data sets.
For this test, I used a basic cloud service. Well maybe it isn’t basic, but for others they will consider it very basic with very little compute involved.
I used an Oracle Cloud DBaaS for this experiment. I selected an Oracle 18c Extreme edition cloud service. This comes with the in-database machine learning option. This comes with 1 OCPUs, 7.5G Memory and 170GB storage. This is the basic configuration.
Next I created data sets with different sizes. These were based on one particular data set, as this ensures that as the data set size increases, the same kind of data and processing required remained consistent, instead of using completely different data sets.
The data set consisted of the following number of records, 72K, 660K, 210K, 2M, 10M and 50M.
I then created machine learning models using Decisions Tree, Naive Bayes, Support Vector Machine, Generaliszd Linear Models (GLM) and Neural Networks. Yes it was a typical classification problem.
The following table below shows the length of time in seconds to build the models. All data preparations etc was done prior to this.
Note: It should be noted that Automatic Data Preparation was turned on for these algorithms. This performed additional algorithm specific data preparation for each model. That means the times given in the following tables is for some data preparation time and for building the models.
Converting the above table into minutes.
The following outlines the steps to create a Autonomous Data Warehouse Cloud Service.
Log into your Oracle Cloud account and then follow these steps.
1. Select Autonomous Data Warehouse Cloud service from the side menu
2. Select Create Autonomous Data Warehouse button
3. Enter the Compartment details (Display Name, Database Name, CPU Core Count & Storage)
4. Enter a Password for Administrator, and then click ‘Create Autonomous Data Warehouse’
5. Wait until the ADWC is provisioned
Going from this
And you should receive and email that looks like this
6. Click on the name of the ADWS you created
7. Click on the Service Console button
8. Then click on Administration and then Download a Connection Wallet
Specify the password
You an now use this to connect to the ADWS using SQL Developer
One of the new features of the Autonomous Data Warehouse Cloud (ADWC) service is Oracle Machine Learning. This is a Zeppelin based notebook for your machine learning on ADWC. Check out my previous blog post about this.
In order to be able to use this new product and the in-database machine learning in ADWC, you will need your database user to have certain privileges. The first step in this is to create a typical user for accessing the ADWC and grant it the necessary OML privileges.
To do this open the ADWC console and then open the Service Console.
This will then open a new admin page which contains a link for ‘Manage Oracle ML User’. Click on this.
You can then enter the Username, Password and other details for the user, and then click Create.
This will then create a new user that is specific for Oracle Machine Learning. This new user will be granted the DWROLE, that contains the basic schema privileges and the privileges required to run the in-database machine learning algorithms. For those that a familiar with Oracle Data Mining/Oracle Advanced Analytics option in the Enterprise Edition of the Oracle database, you will see that these privileges are very similar.
You can examine the privileges granted to this DWROLE in the database as an administrator. When you do you will see the following:
CREATE ANALYTIC VIEW CREATE ATTRIBUTE DIMENSION ALTER SESSION CREATE HIERARCHY CREATE JOB CREATE MINING MODEL CREATE PROCEDURE CREATE SEQUENCE CREATE SESSION CREATE SYNONYM CREATE TABLE CREATE TRIGGER CREATE TYPE CREATE VIEW READ,WRITE ON directory DATA_PUMP_DIR EXECUTE privilege on the PL/SQL package DBMS_CLOUD