Brendan Tierney
Applying an ODM Model to new data in Oracle – Part 1
This is the first of a two part blog posting on using an Oracle Data Mining model to apply it to or score new data. This first part looks at the how you can score data using the DBMS_DATA_MINING.APPLY procedure in a batch type process.
The second part will be posted in a couple of days and will look how you can apply or score the new data, using our ODM model, in a real-time mode, scoring a single record at a time.
DBMS_DATA_MINING.APPLY
Instead of applying the model to data as it is captured, you may need to apply a model to a large number of records at the same time. To perform this bulk processing we can use the APPLY procedure that is part of the DBMS_DATA_MINING package. The format of the procedure is
DBMS_DATA_MINING.APPLY (
model_name IN VARCHAR2,
data_table_name IN VARCHAR2,
case_id_column_name IN VARCHAR2,
result_table_name IN VARCHAR2,
data_schema_name IN VARCHAR2 DEFAULT NULL);
| Parameter Name | Description |
| Model_Name | The name of your data mining model |
| Data_Table_Name | The source data for the model. This can be a tree or view. |
| Case_Id_Column_Name | The attribute that give uniqueness for each record. This could be the Primary Key or if the PK contains more than one column then a new attribute is needed |
| Result_Table_Name | The name of the table where the results will be stored |
| Data_Schema_Name | The schema name for the source data |
The main condition for applying the model is that the source table (DATA_TABLE_NAME) needs to have the same structure as the table that was used when creating the model.
Also the data needs to be prepossessed in the same way as the training data to ensure that the data in each attribute/feature has the same formatting.
When you use the APPLY procedure it does not update the original data/table, but creates a new table (RESULT_TABLE_NAME) with a structure that is dependent on what the underlying DM algorithm is. The following gives the Result Table description for the main DM algorithms:
For a Classification algorithms
case_id VARCHAR2/NUMBER
prediction NUMBER / VARCHAR2 — depending a target data type
probability NUMBER
For Regression
case_id VARCHAR2/NUMBER
prediction NUMBER
For Clustering
case_id VARCHAR2/NUMBER
cluster_id NUMBER
probability NUMBER
Example / Case Study
My last few blog posts on ODM have covered most of the APIs for building and transferring models. We will be using the same data set in these posts. The following code uses the same data and models to illustrate how we can use the DBMS_DATA_MINING.APPLY procedure to perform a bulk scoring of data.
In my previous post we used the EXPORT and IMPORT procedures to move a model from one database (Test) to another database (Production). The following examples uses the model in Production to score new data. I have setup a sample of data (NEW_DATA_TO_SCORE) from the SH schema using the same set of attributes as was used to create the model (MINING_DATA_BUILD_V). This data set contains 1500 records.
SQL> desc NEW_DATA_TO_SCORE
Name Null? Type
———————————— ——– ————
CUST_ID NOT NULL NUMBER
CUST_GENDER NOT NULL CHAR(1)
AGE NUMBER
CUST_MARITAL_STATUS VARCHAR2(20)
COUNTRY_NAME NOT NULL VARCHAR2(40)
CUST_INCOME_LEVEL VARCHAR2(30)
EDUCATION VARCHAR2(21)
OCCUPATION VARCHAR2(21)
HOUSEHOLD_SIZE VARCHAR2(21)
YRS_RESIDENCE NUMBER
AFFINITY_CARD NUMBER(10)
BULK_PACK_DISKETTES NUMBER(10)
FLAT_PANEL_MONITOR NUMBER(10)
HOME_THEATER_PACKAGE NUMBER(10)
BOOKKEEPING_APPLICATION NUMBER(10)
PRINTER_SUPPLIES NUMBER(10)
Y_BOX_GAMES NUMBER(10)
OS_DOC_SET_KANJI NUMBER(10)
SQL> select count(*) from new_data_to_score;
COUNT(*)
———-
1500
The next step is to run the the DBMS_DATA_MINING.APPLY procedure. The parameters that we need to feed into this procedure are
| Parameter Name | Description |
| Model_Name | CLAS_DECISION_TREE — we imported this model from our test database |
| Data_Table_Name | NEW_DATA_TO_SCORE |
| Case_Id_Column_Name | CUST_ID — this is the PK |
| Result_Table_Name | NEW_DATA_SCORED — new table that will be created that contains the Prediction and Probability. |
The NEW_DATA_SCORED table will contain 2 records for each record in the source data (NEW_DATA_TO_SCORE). For each record in NEW_DATA_TO_SCORE we will have one record for the each of the Target Values (O or 1) and the probability for each target value. So for our NEW_DATA_TO_SCORE, which contains 1,500 records, we will get 3,000 records in the NEW_DATA_SCORED table.
To apply the model to the new data we run:
BEGIN
dbms_data_mining.apply(
model_name => ‘CLAS_DECISION_TREE’,
data_table_name => ‘NEW_DATA_TO_SCORE’,
case_id_column_name => ‘CUST_ID’,
result_table_name => ‘NEW_DATA_SCORED’);
END;
/
This takes 1 second to run on my laptop, so this apply/scoring of new data is really quick.
The new table NEW_DATA_SCORED has the following description
SQL> desc NEW_DATA_SCORED
Name Null? Type
——————————- ——– ——-
CUST_ID NOT NULL NUMBER
PREDICTION NUMBER
PROBABILITY NUMBER
SQL> select count(*) from NEW_DATA_SCORED;
COUNT(*)
———-
3000
We can now look at the prediction and the probabilities
SQL> select * from NEW_DATA_SCORED where rownum <=12;
CUST_ID PREDICTION PROBABILITY
———- ———- ———–
103001 0 1
103001 1 0
103002 0 .956521739
103002 1 .043478261
103003 0 .673387097
103003 1 .326612903
103004 0 .673387097
103004 1 .326612903
103005 1 .767241379
103005 0 .232758621
103006 0 1
103006 1 0
12 rows selected.
My UKOUG Conference 2011 Schedule
The UKOUG conference will be in a couple of weeks. I have my flights and hotel booked, and I’ve just finished selecting my agenda of presentations. I really enjoy this conference as it serves many purposes including, finding new directions Oracle is taking, new product features, some upskilling/training, confirming that the approaches that I have been using on projects are valid, getting lots of hints and tips, etc.
One thing that I always try to do and I strongly everyone (in particular first timers) to do is to go to 1 session everyday that is on a topic or product that you know (nearly) nothing about. You might discover that you know more than you think or you may learn something new that can be feed into some project on your return or over the next 12 months.
My agenda for the conference currently looks Very busy and in between these session, there is the exhibition hall, meetings with old and new friends, meetings with product/business unit managers, asking people to write articles for Oracle Scene, checking out possible presenters to come to Ireland for our conference in March 2012, etc. Then there is my presentation on the Wednesday afternoon.
Sunday
I’ll miss most of the Oak Table event on the Sunday but I hope to make it in time for
16:40-17:30 : Performance & High Availability Panel Session
Monday
9:20-9:50 : Keynote by Mark Sunday, Oracle (H1)
10:00-10:45 : The Future of BI & Oracle roadmap, Mike Durran, Oracle (H5)
11:05-12:05 : Implementing Interactive Maps with OBIEE 11g, Antony Heljula, Peak Indicators (H10A)
12:15-13:15 : OBI 11g Analysis & Reporting New Features, Mark Rittman (8A)
14:30-15:15 : Master Data Management – What is it & how to make it work – Robert Barnett, Hub Solutions Designs (H10A)
16:20-17:35 : Dummies Guide to Oracle ADF, Grant Ronald, Oracle, (Media Suite)
16:35-18:30 : The DB Time Performance Method, Graham Wood, Oracle (H8A)
17:45-18:30 : Performance & Stability with Oracle 11g SQL Plan Management, Doug Burns (H1)
17:45-18:30 : Experiences in Virtualization, Michael Doherty (H10A)
19:45-20:45 : Exhibition Welcome Drinks
20:45-Late : Focus Pubs
Tuesday
9:00-11:00 : Next Generation BI Architectures Masterclass, Andrew Bond, Oracle (H10B)
10:10-10:55 : Who’s afraid of Analytic Functions, Alex Nuijten, Maxima (H5)
11:15-12:15 : Analysing Your Data with Analytic Functions, Carl Dudley, (H9)
11:25-13:25 : Using a Physical Standby to Minimize Downtime for DB Release or Server Change, Michael Abbey, Pythian (Media Suite)
14:40-15:25 : How note to make the headlines, Mark Clewett, Hitachi (H10A)
14:40-15:25 : APEX Back to Basics, Paul Broughton, APEX Evangelists (H9)
15:35-16:20 : Can People be identified in the database, Pete Finnigan (H1)
16:40-18:35 : OTN Hands-on Workshop, Todd Trichler, Oracle (H8A)
17:50-18:35 : SQL Developer Data Modeler as a replacement for Oracle Designer, Paul Bainbridge, Fujitsu, (H8B)
18:45-19:45 : Keynote : Future of Enterprise Software and Oracle, Ray Wang, Constellation Research (H1)
20:00-Late : Evening Social & Networking
Wednesday
9:00-10:00 : Oracle 11g Database: Automatic Parallelism, Joel Goodman, Oracle (H9)
9:00-10:00 : Big Data: Learn how to predict the future, Keith Laker, Oracle (H8B)
10:10-10:55 : All about indexes – What to index, when and how, Mark Bobak, ProQuest (H5)
11:20-12:30 : Using Application Express to Build Highly Accessible Products, Anthony Rayner, Oracle (H8A)
12:30-13:30 : Practical uses for APEX Dictionary, John Scott, APEX Evangelists (H8A)
15:20-16:05 : How to deploy you Oracle Data Miner 11g R2 Workflows in a Live Environment – Me (H7B)
16:15-17:00 : Next Generation Data Warehousing, Kulvinder Hari, Oracle (H8A)
16:15-17:00 : Beyond RTFM and WTF Message Moments. Introducing a new standard: Oracle Fusion Applications User Assistance, Ultan O’Broin (Executive Room 7)
I know I have some overlapping sessions, but I will decide on the date which of these I will attend.
As you an see I will be following the BI stream mainly, with a few sessions on the Database and Development streams too.
This year there is a smart phone app help us organise our agenda, meetings, etc, The only downside is that the app does not import the agenda that I created on the website. So I have to do it again. Maybe for next year they will have an import agenda feature.

ODM–PL/SQL API for Exporting & Importing Models
In a previous blog post I talked about how you can take a copy of a workflow developed in Oracle Data Miner, and load it into a new schema.
When you data mining project gets to a mature stage and you need to productionalise the data mining process and model updates, you will need to use a different set of tools.
As you gather more and more data and cases, you will be updating/refreshing your models to reflect this new data. The new update data mining model needs to be moved from the development/test environment to the production environment. As with all things in IT we would like to automate this updating of the model in production.
There are a number of database features and packages that we can use to automate the update and it involves the setting up of some scripts on the development/test database and also on the production database.
These steps include:
- Creation of a directory on the development/test database
- Exporting of the updated Data Mining model
- Copying of the exported Data Mining model to the production server
- Removing the existing Data Mining model from production
- Importing of the new Data Mining model.
- Rename the imported mode to the standard name
The DBMS_DATA_MINING PL/SQL package has 2 functions that allow us to export a model and to import a model. These functions are an API to the Oracle Data Pump. The function to export a model is DBMS_DATA_MINING.EXPORT_MODEL and the function to import a model is DBMS_DATA_MINING.IMPORT_MODEL.The parameters to these function are what you would expect use if you were to use Data Pump directly, but have been tailored for the data mining models.
Lets start with listing the models that we have in our development/test schema:
SQL> connect dmuser2/dmuser2
Connected.
SQL> SELECT model_name FROM user_mining_models;
MODEL_NAME
——————————
CLAS_DT_1_6
CLAS_SVM_1_6
CLAS_NB_1_6
CLAS_GLM_1_6
Create/define the directory on the server where the models will be exported to.
CREATE OR REPLACE DIRECTORY DataMiningDir_Exports AS ‘c:\app\Data_Mining_Exports’;
The schema you are using will need to have the CREATE ANY DIRECTORY privilege.
Now we can export our mode. In this example we are going to export the Decision Tree model (CLAS_DT_1_6)
DBMS_DATA_MINING.EXPORT_MODEL function
The function has the following structure
DBMS_DATA_MINING.EXPORT_MODEL (
filename IN VARCHAR2,
directory IN VARCHAR2,
model_filter IN VARCHAR2 DEFAULT NULL,
filesize IN VARCHAR2 DEFAULT NULL,
operation IN VARCHAR2 DEFAULT NULL,
remote_link IN VARCHAR2 DEFAULT NULL,
jobname IN VARCHAR2 DEFAULT NULL);
If we wanted to export all the models into a file called Exported_DM_Models, we would run:
DBMS_DATA_MINING.EXPORT_MODEL(‘Exported_DM_Models’, ‘DataMiningDir’);
If we just wanted to export our Decision Tree model to file Exported_CLASS_DT_Model, we would run:
DBMS_DATA_MINING.EXPORT_MODEL(‘Exported_CLASS_DT_Model’, ‘DataMiningDir’, ‘name in (”CLAS_DT_1_6”)’);
DBMS_DATA_MINING.DROP_MODEL function
Before you can load the new update data mining model into your production database we need to drop the existing model. Before we do this we need to ensure that this is done when the model is not in use, so it would be advisable to schedule the dropping of the model during a quiet time, like before or after the nightly backups/processes.
DBMS_DATA_MINING.DROP_MODEL(‘CLAS_DECISION_TREE’, TRUE)
DBMS_DATA_MINING.IMPORT_MODEL function
Warning : When importing the data mining model, you need to import into a tablespace that has the same name as the tablespace in the development/test database. If the USERS tablespace is used in the development/test database, then the model will be imported into the USERS tablespace in the production database.
Hint : Create a DATAMINING tablespace in your development/test and production databases. This tablespace can be used solely for data mining purposes.
To import the decision tree model we exported previously, we would run
DBMS_DATA_MINING.IMPORT_MODEL(‘Exported_CLASS_DT_Model’, ‘DataMiningDir’, ‘name=’CLAS_DT_1_6”’, ‘IMPORT’, null, null, ‘dmuser2:dmuser3’);
We now have the new updated data mining model loaded into the production database.
DBMS_DATA_MINING.RENAME_MODEL function
The final step before we can start using the new updated model in our production database is to rename the imported model to the standard name that is being used in the production database.
DBMS_DATA_MINING.RENAME_MODEL(‘CLAS_DT_1_6’, ‘CLAS_DECISION_TREE’);
Scheduling of these steps
We can wrap most of this up into stored procedures and have schedule it to run on a semi-regular bases, using the DBMS_JOB function. The following example schedules a procedure that controls the importing, dropping and renaming of the models.
DBMS_JOB.SUBMIT(jobnum.nextval, ‘import_new_data_mining_model’, trunc(sysdate), add_month(trunc(sysdate)+1);
This schedules the the running of the procedure to import the new data mining models, to run immediately and then to run every month.
What Conference ? If I had the time and money
If I had lots of free time and enough money what conferences would I go to around the world. I regularly get asked for recommendations on what conferences should a person attend. It all depends on what you want to get out of your conference trip. Be is training, education, information building, networking, etc. or to enjoy the local attractions.
The table below is my preferred list of conferences to attend. All of the conferences below are focused on two main areas. The first area is Oracle and the second area is that of Data Mining/Predictive Analytics.
I hope you find the list useful. If you can recommend some others let me know.
| Month | Conference |
| January | |
| February |
|
| March |
Annual Ireland Oracle Conference – Dublin, Ireland |
| April |
Collaborate (IOUG Conference USA) Enterprise Data World (USA) Miracle OpenWorld (Denmark) |
| May | |
| June |
Oracle Development Tools User Group Kaleidoscope (Kscope) |
| July | |
| August | |
| September | |
| October |
Oracle Open World – San Francisco, USA |
| November |
Data Governance – Winter Conference (USA) Predictive Analytics World – UK International Conference on Data Mining & Engineering (ICDMKE) Australia Oracle User Group Conference Germany Oracle User Group Conference (DOAG) |
| December |
Annual UKOUG Conference – Birmingham, UK IEEE International Conference on Data Mining (ICDM) Oracle Open World Latin America |
There is a lot of conferences in the October, November and December months. Some of these are on overlapping dates, which is a pity. Perhaps the organisers of some of these conferences. Also during the January and February months there does not seem to be any conferences in the areas.
If you would like to sponsor a trip to one or more of these then drop me an email ![]()
ODM 11.2 Data Dictionary Views.
The Oracle 11.2 database contains the following Oracle Data Mining views. These allow you to query the database for the metadata relating to what Data Mining Models you have, what the configurations area and what data is involved.
ALL_MINING_MODELS
Describes the high level information about the data mining models in the database. Related views include DBA_MINING_MODELS and USER_MINING_MODELS.
| Attribute | Data Type | Description |
| OWNER | Varchar2(30) NN | Owner of the mining model |
| MODEL_NAME | Varchar2(30) NN | Name of the mining model |
| MINING_FUNCTION | Varchar2(30) | What data mining function to use CLASSIFICATION REGRESSION CLUSTERING FEATURE_EXTRACTION ASSOCIATION_RULES ATTRIBUTE_IMPORTANCE |
| ALGORITHM | Varchar2(30) | Algorithm used by the model NAIVE_BAYES ADAPTIVE_BAYES_NETWORK DECISION_TREE SUPPORT_VECTOR_MACHINES KMEANS O_CLUSTER NONNEGATIVE_MATRIX_FACTOR GENERALIZED_LINEAR_MODEL APRIORI_ASSOCIATION_RULES MINIMUM_DESCRIPTION_LENGTH |
| CREATION_DATE | Date NN | Date model was created |
| BUILD_DURATION | Number | Time in seconds for the model build process |
| MODEL_SIZE | Number | Size of model in MBytes |
| COMMENTS | Varchar2(4000) |
SELECT model_name,
mining_function,
algorithm,
build_duration,
model_size
FROM ALL_MINING_MODELS;
MODEL_NAME MINING_FUNCTION ALGORITHM BUILD_DURATION MODEL_SIZE
————- —————- ————————– ————– ———-
CLAS_SVM_1_6 CLASSIFICATION SUPPORT_VECTOR_MACHINES 3 .1515
CLAS_DT_1_6 CLASSIFICATION DECISION_TREE 2 .0842
CLAS_GLM_1_6 CLASSIFICATION GENERALIZED_LINEAR_MODEL 3 .0877
CLAS_NB_1_6 CLASSIFICATION NAIVE_BAYES 2 .0459
Describes the attributes of the data mining models. Related views are DBA_MINING_MODEL_ATTRIBUTES and USER_MINING_MODEL_ATTRIBUTES.
| Attribute | Data Type | Description |
| OWNER | Varchar2(30) NN | Owner of the mining model |
| MODEL_NAME | Varchar2(30) NN | Name of the mining mode |
| ATTRIBUTE_NAME | Varchar2(30) NN | Name of the attribute |
| ATTRIBUTE_TYPE | Varchar2(11) | Logical type of attribute NUMERICAL – numeric data CATEGORICAL – character data |
| DATA_TYPE | Varchar2(12) | Data type of attribute |
| DATA_LENGTH | Number | Length of data type |
| DATA_PRECISION | Number | Precision of a fixed point number |
| DATA_SCALE | Number | Scale of the fixed point number |
| USAGE_TYPE | Varchar2(8) | Indicated if the attribute was used to create the model (ACTIVE) or not (INACTIVE) |
| TARGET | Varchar2(3) | Indicates if the attribute is the target |
If we take one of our data mining models that was listed about and select what attributes are used by that model;
SELECT attribute_name,
attribute_type,
usage_type,
target
from all_mining_model_attributes
where model_name = ‘CLAS_DT_1_6’;
ATTRIBUTE_NAME ATTRIBUTE_T USAGE_TY TAR
—————————— ———– ——– —
AGE NUMERICAL ACTIVE NO
CUST_MARITAL_STATUS CATEGORICAL ACTIVE NO
EDUCATION CATEGORICAL ACTIVE NO
HOUSEHOLD_SIZE CATEGORICAL ACTIVE NO
OCCUPATION CATEGORICAL ACTIVE NO
YRS_RESIDENCE NUMERICAL ACTIVE NO
Y_BOX_GAMES NUMERICAL ACTIVE NO
AFFINITY_CARD CATEGORICAL ACTIVE YES
The first thing to note here is that all the attributes are listed as ACTIVE. This is the default and will be the case for all attributes for all the algorithms, so we can ignore this attribute in our queries, but it is good to check just in case.
The second thing to note is for the last row we have the AFFINITY_CARD has a target attribute value of YES. This is the target attributes used by the classification algorithm.
ALL_MINING_MODEL_SETTINGS
Describes the setting of the data mining models. The settings associated with a model are algorithm dependent. The Setting values can be provided as input to the model build process. Alternatively, separate settings table can used. If no setting values are defined of provided, then the algorithm will use its default settings.
| Attribute | Data Type | Description |
| OWNER | Varchar2(30) NN | Owner of the mining model |
| MODEL_NAME | Varchar2(30) NN | Name of the mining model |
| SETTING_NAME | Varchar2(30) NN | Name of the Setting |
| SETTING_VALUE | Varchar2(4000) | Value of the Setting |
| SETTING_TYPE | Varchar2(7) | Indicates whether the default value (DEFAULT) or a user specified value (INPUT) is used by the model |
Lets take our previous example of the ‘CLAS_DT_1_6’ model and query the database to see what the setting are.
column setting_value format a30
select setting_name,
setting_value,
setting_type
from all_mining_model_settings
where model_name = ‘CLAS_DT_1_6’;
SETTING_NAME SETTING_VALUE SETTING
———————– —————————- ——-
ALGO_NAME ALGO_DECISION_TREE INPUT
PREP_AUTO ON INPUT
TREE_TERM_MINPCT_NODE .05 INPUT
TREE_TERM_MINREC_SPLIT 20 INPUT
TREE_IMPURITY_METRIC TREE_IMPURITY_GINI INPUT
CLAS_COST_TABLE_NAME ODMR$15_42_50_762000JERWZYK INPUT
TREE_TERM_MINPCT_SPLIT .1 INPUT
TREE_TERM_MAX_DEPTH 7 INPUT
TREE_TERM_MINREC_NODE 10 INPUT
Tom Kyte Seminar Day–Dublin
On Wednesday 2nd December, I attended a full day of presentations given by Tom Kyte of Oracle (asktom.oracle.com). Tom covered a number of topics and these included some of his Oracle Open World presentations.
The topics that were covered included
- 5 things about SQL (OOW11)
- Database Option Packs
- 5 things about PL/SQL (OOW11)
- Q&A Ask Tom Session
All of these presentations can be downloaded from Tom’s website www.asktom.com.
Tom wont be presenting at the annual UKOUG conference in December, but he is hoping to be there next year (2012).
ODM PL/SQL API 11.2 New Features
The PL/SQL API interface for Oracle Data Miner has had a number of new features. These are listed below along with the new API features added with the 11.1 release.
- Support for Native Transactional Data with Association Rules: you can build association rule models without first transforming the transactional data.
- SVM class weights specified with CLAS_WEIGHTS_TABLE_NAME: including the GLM class weights
- FORCE argument to DROP_MODEL: you can now force a drop model operation even if a serious system error has interrupted the model build process
- GET_MODEL_DETAILS_SVM has a new REVERSE_COEF parameter: you can obtain the transformed attribute coefficients used internally by an SVM model by setting the new REVERSE_COEF parameter to 1
11.1g API New Features
- Mining Model schema objects: previous releases, DM models were implemented as a collection of tables and metadata within the DMSYS schema. in 11.1 models are implemented as data dictionary objects in the SYS schema. A new set of DD views present DM models and their properties
- Automatic and Embedded Data Preparation: previously data preparation was the responsibility of the user. Now it can be automated
- Scoping of Nested Data: supports nested data types for both categorical and numerical data. Most algorithms require multi-record case data to the presented as columns of nested rows, each containing an attribute name/value pair. ODM processes each nested row as a separate attribute.
- Standardised Handling of Sparse Data & Missing Values: standardised across all algorithms.
- Generalised Linear Models: has a new algorithm and supports classification (logistic regression) and regression (linear regression)
- New SQL Data Mining Function: PREDICTION_BOUNDS has been introduced for Generalised Linear Models. This returns the confidence bounds on predicted values (regression models) or predicted probabilities (classification)
- Enhanced Support for Cost-Sensitive Decision Making: can be added or removed using DATA_MINING.ADD_COST_MATRIX and DBMS_DATA_MINING_REMOVE_COST_MATRIX.
Interesting quotes from Predictive Analytics World
The Predictive Analytics World conference is finishing up today in New York. Over the past few days the conference has had some of the leading analytic type people presenting at it.
Twitter, as usual, has been busy and there has been some very interesting and important quotes.
The list of tweets (#pawcon) below are the ones I found most interesting:
Manu Sharma from LinkedIn: “Guru” job title is down, “Ninja” is up.
Despite the “data science” buzz, the biggest skill among #pawcon attendees is ” #DataMining
Andrea Medinaceli: Visualization is very powerful for making analytics results accessible to upper management (and for buy-in)
Social Network Analytics (SNA) with Zynga, 20M daily active users, 90M monthly active users; 10K nodes, 45K edges (big!)
Vertica: Zynga is an analytics company in the disguise of a gaming company; graph analytics find users/influencers
Colin Shearer: Find me something interesting in my data is a question from hell (analysis should be guided by business goals)
John Elder advocates ensemble methods – usually improve analytics results
Tom Davenport: to get real value, #analytics need to move from one-time craft to industrialized activity
10 years from now all Fortune 500 companies will have a Chief Analytics Officer at the level of COO or CFO
Must be a sign of the economy, so much of the focus on the value of predictive is on retaining customers. #PAWCON.
Tom Davenport: #Analytics is not about math, it is about relationships (with your business client) – says Intel Chief Mathematician
Karl Rexer: companies with higher analytic capabilities are doing better than their peers
ODM API Demos in PL/SQL (& Java)
If you have been using Oracle Data Miner to develop your data mining workflows and models, at some point you will want to move away from the tool and start using the ODM APIs.
Oracle Data Mining provides a PL/SQL API and a Java API for creating supervised and unsupervised data mining models. The two APIs are fully interoperable, so that a model can be created with one API and then modified or applied using the other API.
I will cover the Java APIs in a later post, so watch out for that.
To help you get started with using the APIs there are a number of demo PL/SQL programs available. These were available as part of the the pre-11.2g version of the tool. But they don’t seem to packaged up with the 11.2 (SQL Developer 3) application.
The following table gives a list of the PL/SQL demo programs that are available. Although these were part of the pre-11.2g tool, they still seem to work on your 11.2g database.
You can download a zip of these files from here.
The sample PL/SQL programs illustrate each of the algorithms supported by Oracle Data Mining. They include examples of data transformations appropriate for each algorithm.
I will be exploring the main APIs, how to set them up, the parameters, etc., over the next few weeks, so check back for these posts.
Book Donation by Oracle
Today I received two boxes, containing 48 books of
The Performance Management Revolution by Howard Dresner
These books have been kindly donated by Duncan Fitter, UK Business Development Director at Oracle.
I will be distributing these books to my MSc Data Mining students over the next week.
Thanks Duncan and Oracle
SQL Developer 3.1 EA & Bug
The new/updated SQL Developer 3.1 Early Adopter has just been released.
For the Data Miner, there are no major changes and it appears that there has been some bug fixes and some minor enhancements to so parts.
The main ODM features, apart from bug fixes, in this release include:
- Globalization support, including translated error messages and GUI for all languages supported by SQL Developer
- Improved accessibility features including the addition of a Structure navigator that lists all the nodes and links displayed in a workflow
Bug / Feature
After unzipping the download I opened SQL Developer. With each new release you will have to upgrade the existing ODM repository. The easiest way of doing this is to open the ODM connections pane and double click on one of your ODM schemas. SQL Developer will then run the necessary scripts to upgrade the repository.
I discovered a bug/feature with SQL Developer 3.1 EA1 upgrade script. The repository upgrade does not complete and an error is report.
I logged this error on the ODM forum on OTN. Mark Kelly who is the Development Manager for ODM and monitors the ODM forum, and his team, were quickly onto investigating the error. Mark has posted an update on the ODM form and give a script that needs to be run before you upgrade your existing repository.
You can download the pre-upgrade script from here.
If you don’t have an existing repository then you don’t have to run the script.
Check out the message on the ODM forum.
https://forums.oracle.com/forums/ann.jspa?annID=1678
https://forums.oracle.com/forums/thread.jspa?threadID=2296374&tstart=0
How to Upgrade SQL Developer & ODM
You will have to download the new SQL Developer 3.1 EA install files.
http://www.oracle.com/technetwork/developer-tools/sql-developer/sqldev-ea-download-486950.html
- Unzip this into your SQL Developer directory
- Create a shortcut for sqldeveloper.exe on your desktop and relabel it SQL Developer 3.1 EA
- Double-click this short cut
- You should be presented with the above window. Select the Yes button to migrate you previous install settings
- SQL Developer should now open and contains all your previous connections
If you have an existing ODM repository, you need to run the pre-upgrade script (see above) at this point
- You will now have to upgrade the ODM repository in the database. The simplest way of doing this is to allow SQL Developer to run the necessary scripts.
- From the View Menu, select Oracle Data Miner –> Connections
- In the ODM Connections pane double click one of your ODM schemas. Enter the username and password and click OK
- You will then be prompted to migrate/update the ODM repository to the new version. Click Yes.
- Enter the SYS username and Password
- Click Start button, to start the migrate/upgrade scripts
- On my laptop this migrate/upgrade step took less than 1 minute
- The upgrade is now finished and you can start using ODM.
ODM – SQL Developer 3.1 EA – Release Notes
The ODM release notes can be found at
http://www.oracle.com/technetwork/database/options/odm/dataminer-31-relnotes-489144.html
Oracle Events in Ireland (Q4 2011)
Over the coming months (Q4 2011) there are a number of Oracle related events being run in Ireland. The schedule for these is below with the relevant links to the agenda webpages or to where you can book your place.
The OUG BI SIG meetings you can book your place with the UKOUG.
Venue Address – Dublin:
Oracle Block H, East Point Business Park, Dublin 3
Venue Address – Belfast:
The Mount Conference Center, 2 Woodstock Link, Belfast BT6 8DD
For questions about logistics please contact the marketing team on marketing-ie_ie@oracle.com
If you have any question about the content please contact: mina.sagha.zadeh@oracle.com
If you know of any other events that are not listed, let me know and I’ll update the list

You must be logged in to post a comment.