This is the second of a two part blog posting on using an Oracle Data Mining model to apply it to or score new data. The first part looked at how you can score data the DBMS_DATA_MINING.APPLY procedure for scoring data batch type process.
This second part looks at how you can apply or score the new data, using our ODM model, in a real-time mode, scoring a single record at a time.
The PREDICTION SQL function can be used in many different ways. The following examples illustrate the main ways of using it. Again we will be using the same data set with data in our (NEW_DATA_TO_SCORE) table.
The syntax of the function is
PREDICTION ( model_name, USING attribute_list);
Example 1 – Real-time Prediction Calculation
In this example we will select a record and calculate its predicted value. The function will return the predicted value with the highest probability
SELECT cust_id, prediction(clas_decision_tree using *)
WHERE cust_id = 103001;
So a predicted class value is 0 (zero) and this has a higher probability than a class value of 1.
We can compare and check this results with the result that was produced using the DBMS_DATA_MINING.APPLY function (see previous blog post).
SQL> select * from new_data_scored
2 where cust_id = 103001;
CUST_ID PREDICTION PROBABILITY
———- ———- ———–
103001 0 1
103001 1 0
Here we can see that the class value of 0 has a probability of 1 (100%) and the class value of 1 has a probability of 0 (0%).
Example 2 – Selecting top 10 Customers with Class value of 1
For this we are selecting from our NEW_DATA_TO_SCORE table. We want to find the records that have a class value of 1 and has the highest probability. We only want to return the first 10 of these
WHERE PREDICTION(clas_decision_tree using *) = 1
AND rownum <=10;
Example 3 – Selecting records based on Prediction value and Probability
For this example we want to find our from what Countries do the customer come from where the Prediction is 0 (wont take up offer) and the Probability of this occurring being 1 (100%). This example introduces the PREDICTION_PROBABILITY function. This function allows use to use the probability strength of the prediction.
select country_name, count(*)
where prediction(clas_decision_tree using *) = 0
and prediction_probability (clas_decision_tree using *) = 1
group by country_name
order by count(*) asc;
Saudi Arabia 1
New Zealand 1
United States of America 293
The examples that I have give above are only the basic examples of using the PREDICTION function. There are a number of other uses that include using the PREDICTION_COST, PREDICTION_SET, PREDICTION_DETAILS. Examples of these will be covered in a later blog post
The Oracle 11.2 database contains 3 PL/SQL packages that allow you to perform all (well almost all) of your data mining functions.
So instead of using the Oracle Data Miner tool you can write some PL/SQL code that will you to do the same things.
Before you can start using these PL/SQL packages you need to ensure that the schema that you are going to use has been setup with the following:
- Create a schema or use and existing one
- Grant the schema all the data mining privileges: see my earlier posting on how to setup an Oracle schema for data mining – Click here and YouTube video
- Grant all necessary privileges to the data that you will be using for data mining
The first PL/SQL package that you will use is the DBMS_DATA_MINING_TRANSFORM. This PL/SQL package allows you to transform the data to make it suitable for data mining. There are a number of functions in this package that allows you to transform the data, but depending on the data you may need to write your own code to perform the transformations. When you apply your data model to the test or the apply data sets, ODM will automatically take the transformation functions defined using this package and apply them to the new data sets.
The second PL/SQL package is DBMS_DATA_MINING. This is the main data mining PL/SQL package. It contains functions to allow you to:
- To create a Model
- Describe the Model
- Exploring and importing of Models
- Computing costs and text metrics for classification Models
- Applying the Model to new data
- Administration of Models, like dropping, renaming, etc
The next (and last) PL/SQL package is DBMS_PREDICTIVE_ANALYTICS.The routines included in this package allows you to prepare data, build a model, score a model and return results of model scoring. The routines include EXPLAIN which ranks attributes in order of influence in explaining a target column. PREDICT which predicts the value of a target attribute based on values in the input. PROFILE which generates rules that describe the cases from the input data.
Over the coming weeks I will have separate blog posts on each of these PL/SQL packages. These will cover the functions that are part of each packages and will include some examples of using the package and functions.
The PL/SQL API interface for Oracle Data Miner has had a number of new features. These are listed below along with the new API features added with the 11.1 release.
- Support for Native Transactional Data with Association Rules: you can build association rule models without first transforming the transactional data.
- SVM class weights specified with CLAS_WEIGHTS_TABLE_NAME: including the GLM class weights
- FORCE argument to DROP_MODEL: you can now force a drop model operation even if a serious system error has interrupted the model build process
- GET_MODEL_DETAILS_SVM has a new REVERSE_COEF parameter: you can obtain the transformed attribute coefficients used internally by an SVM model by setting the new REVERSE_COEF parameter to 1
11.1g API New Features
- Mining Model schema objects: previous releases, DM models were implemented as a collection of tables and metadata within the DMSYS schema. in 11.1 models are implemented as data dictionary objects in the SYS schema. A new set of DD views present DM models and their properties
- Automatic and Embedded Data Preparation: previously data preparation was the responsibility of the user. Now it can be automated
- Scoping of Nested Data: supports nested data types for both categorical and numerical data. Most algorithms require multi-record case data to the presented as columns of nested rows, each containing an attribute name/value pair. ODM processes each nested row as a separate attribute.
- Standardised Handling of Sparse Data & Missing Values: standardised across all algorithms.
- Generalised Linear Models: has a new algorithm and supports classification (logistic regression) and regression (linear regression)
- New SQL Data Mining Function: PREDICTION_BOUNDS has been introduced for Generalised Linear Models. This returns the confidence bounds on predicted values (regression models) or predicted probabilities (classification)
- Enhanced Support for Cost-Sensitive Decision Making: can be added or removed using DATA_MINING.ADD_COST_MATRIX and DBMS_DATA_MINING_REMOVE_COST_MATRIX.
The new/updated SQL Developer 3.1 Early Adopter has just been released.
For the Data Miner, there are no major changes and it appears that there has been some bug fixes and some minor enhancements to so parts.
The main ODM features, apart from bug fixes, in this release include:
- Globalization support, including translated error messages and GUI for all languages supported by SQL Developer
- Improved accessibility features including the addition of a Structure navigator that lists all the nodes and links displayed in a workflow
Bug / Feature
After unzipping the download I opened SQL Developer. With each new release you will have to upgrade the existing ODM repository. The easiest way of doing this is to open the ODM connections pane and double click on one of your ODM schemas. SQL Developer will then run the necessary scripts to upgrade the repository.
I discovered a bug/feature with SQL Developer 3.1 EA1 upgrade script. The repository upgrade does not complete and an error is report.
I logged this error on the ODM forum on OTN. Mark Kelly who is the Development Manager for ODM and monitors the ODM forum, and his team, were quickly onto investigating the error. Mark has posted an update on the ODM form and give a script that needs to be run before you upgrade your existing repository.
You can download the pre-upgrade script from here.
If you don’t have an existing repository then you don’t have to run the script.
Check out the message on the ODM forum.
How to Upgrade SQL Developer & ODM
You will have to download the new SQL Developer 3.1 EA install files.
- Unzip this into your SQL Developer directory
- Create a shortcut for sqldeveloper.exe on your desktop and relabel it SQL Developer 3.1 EA
- Double-click this short cut
- You should be presented with the above window. Select the Yes button to migrate you previous install settings
- SQL Developer should now open and contains all your previous connections
If you have an existing ODM repository, you need to run the pre-upgrade script (see above) at this point
- You will now have to upgrade the ODM repository in the database. The simplest way of doing this is to allow SQL Developer to run the necessary scripts.
- From the View Menu, select Oracle Data Miner –> Connections
- In the ODM Connections pane double click one of your ODM schemas. Enter the username and password and click OK
- You will then be prompted to migrate/update the ODM repository to the new version. Click Yes.
- Enter the SYS username and Password
- Click Start button, to start the migrate/upgrade scripts
- On my laptop this migrate/upgrade step took less than 1 minute
- The upgrade is now finished and you can start using ODM.
ODM – SQL Developer 3.1 EA – Release Notes
The ODM release notes can be found at
Oracle Data Miner functionality is now well established and proven over the years. In particular with the release of the ODM 11gR2 version of the tool. But how will Oracle Data Miner develop into the future.
There are 4 main paths or Frontiers for future developments for Oracle Data Miner:
Oracle Data Miner Tool
The new ODM 11gR2 tool is a major development over the previous version of the tool. With the introduction of workflows and some added functionality for some of the features. the tool is now comparable with the likes of SAS Enterprise Miner and SPSS.
But the new tool is not complete and still needs a bit of fine tuning of most of the features. In particular with the usability and interactions. Some of the colour schemes needs to be looked at or to allow users to select their own colours.
Apart from the usability improvements for the tool another major development that is needed, is the ability to translate the workflow and the underlying database objects into usable code. This code can then be incorporated into our applications and other tools. The tool does allow you to produce shell code of the nodes, but there is still a lot of effort needed to make this usable. Under the previous version of the tool there was features available in JDeveloper and SQL Developer to produced packaged code that was easy to include in our applications.
“A lot done – More to do”
Over the past couple of months there has been a few postings on how Oracle Data Miner (11gR2) has been, or will be, incorporated in various Oracle Applications. For example Oracle Fusion Human Capital Management and Oracle Real Time Decision (RTD). Watch out of other applications that will be including Oracle Data Miner.
“A bit done – Lots more to do”
Oracle Business Intelligence
One of the most common places where ODM can be used is with OBIEE. OBIEE is the core engine for the delivery of the BI needs for an organisation. OBIEE coordinates the gathering of data from various sources, the defining of the business measures and then the delivery of this information in various forms to the users. Oracle Data Miner can be included in this process and can add significant value to the BI needs and report.
“A lot done – Need to publicise more”
Most data mining projects are independent of various Applications and BI requirements. They are projects that are hoping to achieve a competitive insight into their organisational data. Over time as the success of some pilot projects become know they need for more data mining projects will increase. This will lead to organisations have a core data mining team to support these project. With this, the team will need tools to support them in the delivery of their project and with the delivery. This is were OBIEE and Oracle Fusion Apps will come increasingly important.
“A lot done – more to do”
Today I gave my VirtaThon presentation on the new Oracle Data Miner 11gR2 tool.
It was an interesting experience as VirtaThon was a virtual conference. The organisation and administration of the conference was excellent.
I had over 25 participants for my presentation, including Carolyn Hamm who has written a book on using Oracle Data Miner 10g. She seemed to enjoy my presentation as she was asking for more at the end, but we had run out of time.
The presentation was an unusual but interesting experience. All the participants were muted, so I could not hear anyone or be asked questions as the presentation progressed. I was not able to judge the body language or facial expressions, for me to work out how the presentation was going.
I was sitting in my living room when giving the presentation and spent almost an hour talking to myself. At time the concentration levels dipped and I have to refocus and used some visualisation to help me concentrate.
The presentation was divided into 2 parts. The first part was a presentation consisting of some background to ODM, how to get setup and running with ODM, and finally a discussion of some of the new features. This first part took approx. 30 minutes which surprised me as during my rehearsals it was talking 16 minutes. The second part of the presentation was a demo of using ODM to create a workflow, generating a classification model and then applying this model to some new data. During my rehearsals this was taking approx. 40 minutes.
I only had 50-55 minutes for my VirtaThon presentation so after my presentation I had 20-25 minutes for the demo. So I had to get through the demo quickly and I had to cut out a discussion of how the data exploration functionality in ODM can be used to get an insight into the data before you start using the data mining features. I will put together a blog post and video of this in a couple of weeks time that will explain it in more detail.
I managed to finish at 49 minutes, which left 6 minutes for questions. There was only a couple of questions, plenty of Thank You’s along with Good Presentation, which is always good to hear.
Thank you to everyone who attended the presentation and to the organisers of VirtaThon.
Before you can start using the Oracle Data Miner features that are now available in SQL Developer 3, there are a few steps you need to perform. This post will walk you through these steps and I have put together a video which goes into more detail. The video is available on my YouTube channel.
I will be posting more How To type videos over the coming weeks and months. Each video will focus in one one particular feature within the new Oracle Data Mining tool.
So following steps are necessary before you can start using the ODM tool
Set up of Oracle Data Miner tabs
To get the ODM tabs to display in SQL Developer, you need to go to the View menu and select the following from the Data Miner submenu
- Data Miner Connections
- Workflow Jobs
- Property Inspector
Create an ODM Schema
There are two main ways to create a Schema. The first and simplest way is to use SQL Developer. To do this you need to create a connection to SYS. Right click on the Other Users option and select Create User.
The second option is to use SQL*Plus to create the user. Using both methods you need to grant Connect & Resource privileges to the user.
Create the Repository
Before you can start using Oracle Data Mining, you need to create an Oracle Data Miner Repository in the database. Again there are two ways to do this. The simplest is to use the inbuilt functionality in SQL Developer. In the Oracle Data Miner Connections tab, double click on the ODM schema you have just created. SQL Developer will check the database to see if the ODM Repository exists in the database. If it will create the repository for you. But you will need to provide the SYS password.
The other way to create the repository is to run the installodmr.sql script that in available in the ‘datamining’ directory.
example: @installodmr.sql USER TEMP
Create another ODM Schema
It is typical that you would need to have more than one schema for your data mining work. After creating the default Oracle schema, the next step is to grant the schema the privileges to use the Data Mining Repository. This script is called
example: @usergrants.sql DMUSER
Hint: The schema name needs to be in upper case.
IMPORTANT: The last grant statement in the script may give an error. If this occurs then it is due to an invalid hidden character on the line. If you do a cut and paste of the grant statement and execute this statement, everything should run fine.
If you want to demo data to be created for this new ODM schema then you need to run
example: @instdemodata.sql DMUSER
All of these scripts can be found in SQL developer directories