This is the first of a two part blog posting on using an Oracle Data Mining model to apply it to or score new data. This first part looks at the how you can score data using the DBMS_DATA_MINING.APPLY procedure in a batch type process.
The second part will be posted in a couple of days and will look how you can apply or score the new data, using our ODM model, in a real-time mode, scoring a single record at a time.
Instead of applying the model to data as it is captured, you may need to apply a model to a large number of records at the same time. To perform this bulk processing we can use the APPLY procedure that is part of the DBMS_DATA_MINING package. The format of the procedure is
model_name IN VARCHAR2,
data_table_name IN VARCHAR2,
case_id_column_name IN VARCHAR2,
result_table_name IN VARCHAR2,
data_schema_name IN VARCHAR2 DEFAULT NULL);
|Model_Name||The name of your data mining model|
|Data_Table_Name||The source data for the model. This can be a tree or view.|
|Case_Id_Column_Name||The attribute that give uniqueness for each record. This could be the Primary Key or if the PK contains more than one column then a new attribute is needed|
|Result_Table_Name||The name of the table where the results will be stored|
|Data_Schema_Name||The schema name for the source data|
The main condition for applying the model is that the source table (DATA_TABLE_NAME) needs to have the same structure as the table that was used when creating the model.
Also the data needs to be prepossessed in the same way as the training data to ensure that the data in each attribute/feature has the same formatting.
When you use the APPLY procedure it does not update the original data/table, but creates a new table (RESULT_TABLE_NAME) with a structure that is dependent on what the underlying DM algorithm is. The following gives the Result Table description for the main DM algorithms:
For a Classification algorithms
prediction NUMBER / VARCHAR2 — depending a target data type
Example / Case Study
My last few blog posts on ODM have covered most of the APIs for building and transferring models. We will be using the same data set in these posts. The following code uses the same data and models to illustrate how we can use the DBMS_DATA_MINING.APPLY procedure to perform a bulk scoring of data.
In my previous post we used the EXPORT and IMPORT procedures to move a model from one database (Test) to another database (Production). The following examples uses the model in Production to score new data. I have setup a sample of data (NEW_DATA_TO_SCORE) from the SH schema using the same set of attributes as was used to create the model (MINING_DATA_BUILD_V). This data set contains 1500 records.
SQL> desc NEW_DATA_TO_SCORE
Name Null? Type
———————————— ——– ————
CUST_ID NOT NULL NUMBER
CUST_GENDER NOT NULL CHAR(1)
COUNTRY_NAME NOT NULL VARCHAR2(40)
SQL> select count(*) from new_data_to_score;
The next step is to run the the DBMS_DATA_MINING.APPLY procedure. The parameters that we need to feed into this procedure are
|Model_Name||CLAS_DECISION_TREE — we imported this model from our test database|
|Case_Id_Column_Name||CUST_ID — this is the PK|
|Result_Table_Name||NEW_DATA_SCORED — new table that will be created that contains the Prediction and Probability.|
The NEW_DATA_SCORED table will contain 2 records for each record in the source data (NEW_DATA_TO_SCORE). For each record in NEW_DATA_TO_SCORE we will have one record for the each of the Target Values (O or 1) and the probability for each target value. So for our NEW_DATA_TO_SCORE, which contains 1,500 records, we will get 3,000 records in the NEW_DATA_SCORED table.
To apply the model to the new data we run:
model_name => ‘CLAS_DECISION_TREE’,
data_table_name => ‘NEW_DATA_TO_SCORE’,
case_id_column_name => ‘CUST_ID’,
result_table_name => ‘NEW_DATA_SCORED’);
This takes 1 second to run on my laptop, so this apply/scoring of new data is really quick.
The new table NEW_DATA_SCORED has the following description
SQL> desc NEW_DATA_SCORED
Name Null? Type
——————————- ——– ——-
CUST_ID NOT NULL NUMBER
SQL> select count(*) from NEW_DATA_SCORED;
We can now look at the prediction and the probabilities
SQL> select * from NEW_DATA_SCORED where rownum <=12;
CUST_ID PREDICTION PROBABILITY
———- ———- ———–
103001 0 1
103001 1 0
103002 0 .956521739
103002 1 .043478261
103003 0 .673387097
103003 1 .326612903
103004 0 .673387097
103004 1 .326612903
103005 1 .767241379
103005 0 .232758621
103006 0 1
103006 1 0
12 rows selected.
Oracle Open World has a huge number of sessions commencing on Sunday and run until Thursday. To help attendees and non-attendees work out what sessions are available you can work your way through the schedule builder.
This can be a bit difficult to find the sessions that you might be interested in. So this year they have produced a set of Focus On documents that contain all the session related to particular areas.
The following are the available Focus On areas and documents:
Let me know if I have missed any Focus On documents and I will update the list.
Oh and don’t forget the Oracle Data Miner sessions.
If you are not able to attend OOW, you can check out the OOW Live channel on YouTube to watch the keynote and main session
If you are at Oracle Open World (OOW11) and you have an interest in Oracle Data Miner, check out the following presentation sessions:
In addition to these sessions there are also the following Hands-On Labs, where you can get your hand dirty with the tool.
Do let me know if I have missed a session so that I can update the list.
I’m not attending OOW11 so let me know what the sessions are like.
And tell Charlie that I sent you
For those lucky people who are heading off to Oracle Open World, there is something I would like you to consider doing for me.
At OOW you will be collecting lots of free stuff. I’m hearing rumours that OTN will have lots of t-shirts, etc and for the Oracle ACE’s some new vests. Or perhaps you are an exhibitor who will be trying to give away some of your merchandised stuff.
Well instead of giving these items as presents, or carefully filled away somewhere in your office or at home, why not donate it to school children in Tanzania.
I’ve recently started a charity called Tech Gear for the Third World. This allows companies and individuals to donate their merchandised items to a good cause. It also helps companies to donate their surplus or old branded goods.
More details and a shipping address can be could here.