oracle big data
Update on Exalytics Pricing
In my previous blog post (Exalytics : How much will it cost me ?) I gave an outline of the pricing you might expect for an Exalytics machine.
The final pricing that I gave of approx $3+M was based on the per processor licencing.
Yesterday (24th Jan) the Oracle Business Intelligence blog by Manan, included the pricing based on the per user licences.
The following is a breakdown of the Exalytics pricing based on the minimum 100 user licencing.
Licence Costs (100 users)
Exalytics machine = $135,000
TimesTen = $300 x 100 users = $30,000
BI Foundation Suite = $3,675 x 100 users = $367,500
Giving a grand total of $532,500.
Support Costs (100 users)
But we need to add the annual support costs to this.
Exalytics machine support = $29,700.
TimesTen support = $66 x 100 users = $6,600
BI Foundations suite = $809 x 100 users = $80,900
Total support costs (100 users) = $116,500
First year & on-going costs costs
Total first year cost for an Exalytics machine = $532,500 + $117,200 = $649,700
Plus on going annual support costs of $117,200 in year 2 and subsequent years.
Discounted Costs
If you are one of the lucky customer who can If I use the same discounts, as I did in my previous blog post, of 25% discount on hardware and 60% discount on the software, we get:
Year 1 cost of : ($135,000*0.75) + ($397,500*0.40) = $260,250
So it might be possible to get an Exalytics machine for $260+K, plus annual support costs.
Exalytics : How much will it cost me ?
Over the past couple of weeks the costing for the Oracle Exalytics machine has been made public by Oracle and there has been a number of articles. What I’ve done in this blog post is to collate this information. I give what I understand to be the cost of purchasing an Exalytic machine and to get setup and running.
The pricing structure starts at
Exalytics machine + cost of BI Foundation Suite + TimesTen licences
Exalytics machine = $135,000
TimesTen = $34,500 per processor licence or $300 per named user(min 100 users)
BI Foundation Suite = $450,000 per processor licence or $3,675 per named user (same number of users as for TimesTen = min 100 users)
Annual Support Costs
Exalytics machine = $29,700
TimesTen = 22% of software licence – $7,590 per processor licence or $66 per named user (min 100 users)
BI Foundation Suite = $99,000 per processor licence or $809 per named user(min 100 users)
The Exalytics machine consists of a single server with 1TB of RAM and 4 Intel Xeon E7-4800 processors, with 10 cores each.
So the total cost of an Exalytics machine based on the processor licence will be something towards the $10M. Now this is before the discounts that you can negotiate. There are reports of discounts ranging up to 25% on hardware and 60% on software. The size of the discount is depended on your size etc. So this initial $10M cost could be reduced to $3M+.
Please note that I may have gotten some or all of this pricing wrong. If I have then forgive me and let me know what is wrong. I can correct it to ensure that we have the correct costs.
ODM 11gR2–Real-time scoring of data
In my previous posts I gave sample code of how you can use your ODM model to score new data.
Applying an ODM Model to new data in Oracle – Part 2
Applying an ODM Model to new data in Oracle – Part 1
The examples given in this previous post were based on the new data being in a table.
In some scenarios you may not have the data you want to score in table. For example you want to score data as it is being recorded and before it gets committed to the database.
The format of the command to use is
prediction(ODM_MODEL_NAME USING )
prediction_probability(ODM_Model_Name, Target Value, USING )
So we can list the model attributes we want to use instead of using the USING * as we did in the previous blog posts
Using the same sample data that I used in my previous posts the command would be:
Select prediction(clas_decision_tree
USING
20 as age,
‘NeverM’ as cust_marital_status,
‘HS-grad’ as education,
1 as household_size,
2 as yrs_residence,
1 as y_box_games) as scored_value
from dual;
SCORED_VALUE
————
0
Select prediction_probability(clas_decision_tree, 0
USING
20 as age,
‘NeverM’ as cust_marital_status,
‘HS-grad’ as education,
1 as household_size,
2 as yrs_residence,
1 as y_box_games) as probability_value
from dual;
PROBABILITY_VALUE
—————–
1
So we get the same result as we got in our previous examples.
Depending of what data we have gathered we may or may not have all the values for each of the attributes used in the model. In this case we can submit a subset of the values to the function and still get a result.
Select prediction(clas_decision_tree
USING
20 as age,
‘NeverM’ as cust_marital_status,
‘HS-grad’ as education) as scored_value2
from dual;
SCORED_VALUE2
————-
0
Select prediction_probability(clas_decision_tree, 0
USING
20 as age,
‘NeverM’ as cust_marital_status,
‘HS-grad’ as education) as probability_value2
from dual;
PROBABILITY_VALUE2
——————
1
Again we get the same results.
Updating your ODM (11g R2) model in production
In my previous blog posts on creating an ODM model, I gave the details of how you can do this using the ODM PL/SQL API.
But at some point you will have a fairly stable environment. What this means is that you will know what type of algorithm and its corresponding settings work best for for your data.
At this point you should be able to re-create your ODM model in the production database. The frequency of doing this update is dependent on number of new cases that you have. So you need to update your ODM model could be daily, weekly, monthly, etc.
To update your model you will need to:
– Creating a settings table for your model
– Create a new ODM model
– Rename your new ODM model to the production name
The following examples are based on the example data, model names, etc that I’ve used in my previous post.
Creating a Settings Table
The first step is to create a setting table for your algorithm. This will contain all the parameter settings needed to create the new model. You will have worked out these setting from your previous attempts at creating your models and you will know what parameters and their values work best.
— Create the settings table
CREATE TABLE decision_tree_model_settings (
setting_name VARCHAR2(30),
setting_value VARCHAR2(30));
— Populate the settings table
— Specify DT. By default, Naive Bayes is used for classification.
— Specify ADP. By default, ADP is not used.
BEGIN
INSERT INTO decision_tree_model_settings (setting_name, setting_value)
VALUES (dbms_data_mining.algo_name,
dbms_data_mining.algo_decision_tree);
INSERT INTO decision_tree_model_settings (setting_name, setting_value)
VALUES (dbms_data_mining.prep_auto,dbms_data_mining.prep_auto_on);
COMMIT;
END;
Create a new ODM Model
We will need to use the DBMS_DATA_MINING.CREATE_MODEL procedure. In our example we will want to create a Decision Tree based on our sample data, which contains the previously generated cases and the new cases since the last model rebuild.
BEGIN
DBMS_DATA_MINING.CREATE_MODEL(
model_name => ‘Decision_Tree_Method2′,
mining_function => dbms_data_mining.classification,
data_table_name => ‘mining_data_build_v’,
case_id_column_name => ‘cust_id’,
target_column_name => ‘affinity_card’,
settings_table_name => ‘decision_tree_model_settings’);
END;
Rename your ODM model to production name
The model we have create created above is not the name that is used in our production software. So we will need to rename it to our production name.
But we need to be careful about when we do this. If you drop a model or rename a model when it is being used then you can end up with indeterminate results.
What I suggest you do, is to pick a time of the day when your production software is not doing any data mining. You should drop the existing mode (or rename it) and the to rename the new model to the production model name.
DBMS_DATA_MINING.DROP_MODEL(‘CLAS_DECISION_TREE‘);
and then
DBMS_DATA_MINING.RENAME_MODEL(‘Decision_Tree_Method2’, ‘CLAS_DECISION_TREE’);
Oracle Analytics Update & Plan for 2012
On Friday 16th December, Charlie Berger (Sr. Director, Product Management, Data Mining & Advanced Analytics) posted the following on the Oracle Data Mining forum on OTN.
“… soon you’ll be able to use the new Oracle R Enterprise (ORE) functionality. ORE is currently in beta and is targeted to go General Availability in the near future. ORE brings additional functionality to the ODM Option, which will then be renamed to the Oracle Advanced Analytics Option to reflect the significant adv. analytical functionality enhancements. ORE will allow R users to write R scripts and run them inside the database and eliminate and/or minimize data movement in/out of the DB. ORE will provide R to SQL transparency for SQL push-down to in-DB SQL and and expanding library of Oracle in-DB statistical functions. Packages that cannot be pushed down will be run in embedded R mode while the DB manages all data flows to the multiple R engines running inside the DB.
In January, we’ll open up a new OTN discussion forum specifically for Oracle R Enterprise focused technical discussions. Stay tuned.”
I’m looking forward to getting my hands on the new Oracle R Enterprise, in 2012. In particular I’m keen to see what additional functionality will be added to the Oracle Data Mining option in the DB.
So watch out for the rebranding to Oracle Advanced Analytics
Charlie – Any chance of an advanced copy of ORE and related DB bits and bobs.
Oracle Big Data Videos
Mark Townsend, Database Product Manager at Oracle gave a presentation on Big Data at the UKOUG conference and used the following videos to illustrate how a company can evolve their Big Data into useful and meaningful information.
Big Data – Gold Mine or just Stuff
Oracle Big Data & Analytics Sessions at UKOUG Conference
There are a number of BIG Data and Analytics presentations at the UKOUG Conference in Birmingham (4th Dec – 7th Dec).
I’ve worked my way through the agenda grids for each day of the conference and I’ve come up with the following list. If you are interested in BIG Data and Analytics these presentations are a must see
| Monday | 12:15-13:15 | Exadata Live – Graham Wood – Hall 7A |
| 5th Dec | ||
| Tuesday | 9:00-10:00 | Big Data-Are you ready – Mark Townsend – Hall 1 |
| 6th Dec | 10:10-10:55 | Who’s afraid of Analytic Functions – Alex Nuijten – Hall 5 |
| 11:15-12:15 | Analysing Your Data with Analytic Functions – Carl Dudley – Hall 9 | |
| 16:40-17:40 | Mobile Analytics using OBIEE 11g – Jon Mead – Exec Room 1 | |
| Wednesday | 9:00-10:00 | Oracle 11g Database Automatic Parallelism – Joel Goodman – Hall 9 |
| 7th Dec | 15:20-16:05 | How to Deploy your Oracle Data Miner in a Live Environment – me |
Exalytics Events over the next week
The BIWA SIG is hosting a techcast called “Using Oracle R Enterprise” on Wednesday 30th November, 2011 at noon EST (approx 6pm GMT).
The TechCast is being presented by Mark Hornick, Senior Manager, Oracle Advanced Analytics Development
URL for TechCast: https://stbeehive.oracle.com/bconf/confDetails?confID=334B:3BF0:owch:38893C00F42F38A1E0404498C8A6612B0004075AECF7&guest=true&confKey=608880
— Web Conference ID: 303397
— Web Conference Key: 608880
— Dialup: 1-866-682-4770, ID 5548204, passcode 1234
Several analytic tool vendors have added R-integration to their software. However, Oracle is the largest company to throw their weight behind R. On October 3, Oracle unveiled their integration of R: Oracle R Enterprise (http://www.oracle.com/us/corporate/features/features-oracle-r-enterprise-498732.html) as part of their Oracle Big Data Appliance announcement (http://www.oracle.com/us/corporate/press/512001).
Oracle R Enterprise allows users to perform statistical analysis with advanced visualization on data stored in Oracle Database. Oracle R Enterprise enables scalable R solutions, while facilitating production deployment of R scripts and Hadoop based solutions, as well as integration of R results with Oracle BI Publisher and OBIEE dashboards.
Check out the Oracle YouTube video (5min), that demos how an Exalytics application that can analyse almost a billion records instantly.
If you are attending the UKOUG Conference in Birmingham, Jon Mead (RittmanMead) is giving a presentation called “What can Exalytics do for me?” and is on Tuesday 5th December @15:35, in the area above the box office.

Applying an ODM Model to new data in Oracle – Part 1
This is the first of a two part blog posting on using an Oracle Data Mining model to apply it to or score new data. This first part looks at the how you can score data using the DBMS_DATA_MINING.APPLY procedure in a batch type process.
The second part will be posted in a couple of days and will look how you can apply or score the new data, using our ODM model, in a real-time mode, scoring a single record at a time.
DBMS_DATA_MINING.APPLY
Instead of applying the model to data as it is captured, you may need to apply a model to a large number of records at the same time. To perform this bulk processing we can use the APPLY procedure that is part of the DBMS_DATA_MINING package. The format of the procedure is
DBMS_DATA_MINING.APPLY (
model_name IN VARCHAR2,
data_table_name IN VARCHAR2,
case_id_column_name IN VARCHAR2,
result_table_name IN VARCHAR2,
data_schema_name IN VARCHAR2 DEFAULT NULL);
| Parameter Name | Description |
| Model_Name | The name of your data mining model |
| Data_Table_Name | The source data for the model. This can be a tree or view. |
| Case_Id_Column_Name | The attribute that give uniqueness for each record. This could be the Primary Key or if the PK contains more than one column then a new attribute is needed |
| Result_Table_Name | The name of the table where the results will be stored |
| Data_Schema_Name | The schema name for the source data |
The main condition for applying the model is that the source table (DATA_TABLE_NAME) needs to have the same structure as the table that was used when creating the model.
Also the data needs to be prepossessed in the same way as the training data to ensure that the data in each attribute/feature has the same formatting.
When you use the APPLY procedure it does not update the original data/table, but creates a new table (RESULT_TABLE_NAME) with a structure that is dependent on what the underlying DM algorithm is. The following gives the Result Table description for the main DM algorithms:
For a Classification algorithms
case_id VARCHAR2/NUMBER
prediction NUMBER / VARCHAR2 — depending a target data type
probability NUMBER
For Regression
case_id VARCHAR2/NUMBER
prediction NUMBER
For Clustering
case_id VARCHAR2/NUMBER
cluster_id NUMBER
probability NUMBER
Example / Case Study
My last few blog posts on ODM have covered most of the APIs for building and transferring models. We will be using the same data set in these posts. The following code uses the same data and models to illustrate how we can use the DBMS_DATA_MINING.APPLY procedure to perform a bulk scoring of data.
In my previous post we used the EXPORT and IMPORT procedures to move a model from one database (Test) to another database (Production). The following examples uses the model in Production to score new data. I have setup a sample of data (NEW_DATA_TO_SCORE) from the SH schema using the same set of attributes as was used to create the model (MINING_DATA_BUILD_V). This data set contains 1500 records.
SQL> desc NEW_DATA_TO_SCORE
Name Null? Type
———————————— ——– ————
CUST_ID NOT NULL NUMBER
CUST_GENDER NOT NULL CHAR(1)
AGE NUMBER
CUST_MARITAL_STATUS VARCHAR2(20)
COUNTRY_NAME NOT NULL VARCHAR2(40)
CUST_INCOME_LEVEL VARCHAR2(30)
EDUCATION VARCHAR2(21)
OCCUPATION VARCHAR2(21)
HOUSEHOLD_SIZE VARCHAR2(21)
YRS_RESIDENCE NUMBER
AFFINITY_CARD NUMBER(10)
BULK_PACK_DISKETTES NUMBER(10)
FLAT_PANEL_MONITOR NUMBER(10)
HOME_THEATER_PACKAGE NUMBER(10)
BOOKKEEPING_APPLICATION NUMBER(10)
PRINTER_SUPPLIES NUMBER(10)
Y_BOX_GAMES NUMBER(10)
OS_DOC_SET_KANJI NUMBER(10)
SQL> select count(*) from new_data_to_score;
COUNT(*)
———-
1500
The next step is to run the the DBMS_DATA_MINING.APPLY procedure. The parameters that we need to feed into this procedure are
| Parameter Name | Description |
| Model_Name | CLAS_DECISION_TREE — we imported this model from our test database |
| Data_Table_Name | NEW_DATA_TO_SCORE |
| Case_Id_Column_Name | CUST_ID — this is the PK |
| Result_Table_Name | NEW_DATA_SCORED — new table that will be created that contains the Prediction and Probability. |
The NEW_DATA_SCORED table will contain 2 records for each record in the source data (NEW_DATA_TO_SCORE). For each record in NEW_DATA_TO_SCORE we will have one record for the each of the Target Values (O or 1) and the probability for each target value. So for our NEW_DATA_TO_SCORE, which contains 1,500 records, we will get 3,000 records in the NEW_DATA_SCORED table.
To apply the model to the new data we run:
BEGIN
dbms_data_mining.apply(
model_name => ‘CLAS_DECISION_TREE’,
data_table_name => ‘NEW_DATA_TO_SCORE’,
case_id_column_name => ‘CUST_ID’,
result_table_name => ‘NEW_DATA_SCORED’);
END;
/
This takes 1 second to run on my laptop, so this apply/scoring of new data is really quick.
The new table NEW_DATA_SCORED has the following description
SQL> desc NEW_DATA_SCORED
Name Null? Type
——————————- ——– ——-
CUST_ID NOT NULL NUMBER
PREDICTION NUMBER
PROBABILITY NUMBER
SQL> select count(*) from NEW_DATA_SCORED;
COUNT(*)
———-
3000
We can now look at the prediction and the probabilities
SQL> select * from NEW_DATA_SCORED where rownum <=12;
CUST_ID PREDICTION PROBABILITY
———- ———- ———–
103001 0 1
103001 1 0
103002 0 .956521739
103002 1 .043478261
103003 0 .673387097
103003 1 .326612903
103004 0 .673387097
103004 1 .326612903
103005 1 .767241379
103005 0 .232758621
103006 0 1
103006 1 0
12 rows selected.
ODM–PL/SQL API for Exporting & Importing Models
In a previous blog post I talked about how you can take a copy of a workflow developed in Oracle Data Miner, and load it into a new schema.
When you data mining project gets to a mature stage and you need to productionalise the data mining process and model updates, you will need to use a different set of tools.
As you gather more and more data and cases, you will be updating/refreshing your models to reflect this new data. The new update data mining model needs to be moved from the development/test environment to the production environment. As with all things in IT we would like to automate this updating of the model in production.
There are a number of database features and packages that we can use to automate the update and it involves the setting up of some scripts on the development/test database and also on the production database.
These steps include:
- Creation of a directory on the development/test database
- Exporting of the updated Data Mining model
- Copying of the exported Data Mining model to the production server
- Removing the existing Data Mining model from production
- Importing of the new Data Mining model.
- Rename the imported mode to the standard name
The DBMS_DATA_MINING PL/SQL package has 2 functions that allow us to export a model and to import a model. These functions are an API to the Oracle Data Pump. The function to export a model is DBMS_DATA_MINING.EXPORT_MODEL and the function to import a model is DBMS_DATA_MINING.IMPORT_MODEL.The parameters to these function are what you would expect use if you were to use Data Pump directly, but have been tailored for the data mining models.
Lets start with listing the models that we have in our development/test schema:
SQL> connect dmuser2/dmuser2
Connected.
SQL> SELECT model_name FROM user_mining_models;
MODEL_NAME
——————————
CLAS_DT_1_6
CLAS_SVM_1_6
CLAS_NB_1_6
CLAS_GLM_1_6
Create/define the directory on the server where the models will be exported to.
CREATE OR REPLACE DIRECTORY DataMiningDir_Exports AS ‘c:\app\Data_Mining_Exports’;
The schema you are using will need to have the CREATE ANY DIRECTORY privilege.
Now we can export our mode. In this example we are going to export the Decision Tree model (CLAS_DT_1_6)
DBMS_DATA_MINING.EXPORT_MODEL function
The function has the following structure
DBMS_DATA_MINING.EXPORT_MODEL (
filename IN VARCHAR2,
directory IN VARCHAR2,
model_filter IN VARCHAR2 DEFAULT NULL,
filesize IN VARCHAR2 DEFAULT NULL,
operation IN VARCHAR2 DEFAULT NULL,
remote_link IN VARCHAR2 DEFAULT NULL,
jobname IN VARCHAR2 DEFAULT NULL);
If we wanted to export all the models into a file called Exported_DM_Models, we would run:
DBMS_DATA_MINING.EXPORT_MODEL(‘Exported_DM_Models’, ‘DataMiningDir’);
If we just wanted to export our Decision Tree model to file Exported_CLASS_DT_Model, we would run:
DBMS_DATA_MINING.EXPORT_MODEL(‘Exported_CLASS_DT_Model’, ‘DataMiningDir’, ‘name in (”CLAS_DT_1_6”)’);
DBMS_DATA_MINING.DROP_MODEL function
Before you can load the new update data mining model into your production database we need to drop the existing model. Before we do this we need to ensure that this is done when the model is not in use, so it would be advisable to schedule the dropping of the model during a quiet time, like before or after the nightly backups/processes.
DBMS_DATA_MINING.DROP_MODEL(‘CLAS_DECISION_TREE’, TRUE)
DBMS_DATA_MINING.IMPORT_MODEL function
Warning : When importing the data mining model, you need to import into a tablespace that has the same name as the tablespace in the development/test database. If the USERS tablespace is used in the development/test database, then the model will be imported into the USERS tablespace in the production database.
Hint : Create a DATAMINING tablespace in your development/test and production databases. This tablespace can be used solely for data mining purposes.
To import the decision tree model we exported previously, we would run
DBMS_DATA_MINING.IMPORT_MODEL(‘Exported_CLASS_DT_Model’, ‘DataMiningDir’, ‘name=’CLAS_DT_1_6”’, ‘IMPORT’, null, null, ‘dmuser2:dmuser3’);
We now have the new updated data mining model loaded into the production database.
DBMS_DATA_MINING.RENAME_MODEL function
The final step before we can start using the new updated model in our production database is to rename the imported model to the standard name that is being used in the production database.
DBMS_DATA_MINING.RENAME_MODEL(‘CLAS_DT_1_6’, ‘CLAS_DECISION_TREE’);
Scheduling of these steps
We can wrap most of this up into stored procedures and have schedule it to run on a semi-regular bases, using the DBMS_JOB function. The following example schedules a procedure that controls the importing, dropping and renaming of the models.
DBMS_JOB.SUBMIT(jobnum.nextval, ‘import_new_data_mining_model’, trunc(sysdate), add_month(trunc(sysdate)+1);
This schedules the the running of the procedure to import the new data mining models, to run immediately and then to run every month.
ODM 11.2 Data Dictionary Views.
The Oracle 11.2 database contains the following Oracle Data Mining views. These allow you to query the database for the metadata relating to what Data Mining Models you have, what the configurations area and what data is involved.
ALL_MINING_MODELS
Describes the high level information about the data mining models in the database. Related views include DBA_MINING_MODELS and USER_MINING_MODELS.
| Attribute | Data Type | Description |
| OWNER | Varchar2(30) NN | Owner of the mining model |
| MODEL_NAME | Varchar2(30) NN | Name of the mining model |
| MINING_FUNCTION | Varchar2(30) | What data mining function to use CLASSIFICATION REGRESSION CLUSTERING FEATURE_EXTRACTION ASSOCIATION_RULES ATTRIBUTE_IMPORTANCE |
| ALGORITHM | Varchar2(30) | Algorithm used by the model NAIVE_BAYES ADAPTIVE_BAYES_NETWORK DECISION_TREE SUPPORT_VECTOR_MACHINES KMEANS O_CLUSTER NONNEGATIVE_MATRIX_FACTOR GENERALIZED_LINEAR_MODEL APRIORI_ASSOCIATION_RULES MINIMUM_DESCRIPTION_LENGTH |
| CREATION_DATE | Date NN | Date model was created |
| BUILD_DURATION | Number | Time in seconds for the model build process |
| MODEL_SIZE | Number | Size of model in MBytes |
| COMMENTS | Varchar2(4000) |
SELECT model_name,
mining_function,
algorithm,
build_duration,
model_size
FROM ALL_MINING_MODELS;
MODEL_NAME MINING_FUNCTION ALGORITHM BUILD_DURATION MODEL_SIZE
————- —————- ————————– ————– ———-
CLAS_SVM_1_6 CLASSIFICATION SUPPORT_VECTOR_MACHINES 3 .1515
CLAS_DT_1_6 CLASSIFICATION DECISION_TREE 2 .0842
CLAS_GLM_1_6 CLASSIFICATION GENERALIZED_LINEAR_MODEL 3 .0877
CLAS_NB_1_6 CLASSIFICATION NAIVE_BAYES 2 .0459
Describes the attributes of the data mining models. Related views are DBA_MINING_MODEL_ATTRIBUTES and USER_MINING_MODEL_ATTRIBUTES.
| Attribute | Data Type | Description |
| OWNER | Varchar2(30) NN | Owner of the mining model |
| MODEL_NAME | Varchar2(30) NN | Name of the mining mode |
| ATTRIBUTE_NAME | Varchar2(30) NN | Name of the attribute |
| ATTRIBUTE_TYPE | Varchar2(11) | Logical type of attribute NUMERICAL – numeric data CATEGORICAL – character data |
| DATA_TYPE | Varchar2(12) | Data type of attribute |
| DATA_LENGTH | Number | Length of data type |
| DATA_PRECISION | Number | Precision of a fixed point number |
| DATA_SCALE | Number | Scale of the fixed point number |
| USAGE_TYPE | Varchar2(8) | Indicated if the attribute was used to create the model (ACTIVE) or not (INACTIVE) |
| TARGET | Varchar2(3) | Indicates if the attribute is the target |
If we take one of our data mining models that was listed about and select what attributes are used by that model;
SELECT attribute_name,
attribute_type,
usage_type,
target
from all_mining_model_attributes
where model_name = ‘CLAS_DT_1_6’;
ATTRIBUTE_NAME ATTRIBUTE_T USAGE_TY TAR
—————————— ———– ——– —
AGE NUMERICAL ACTIVE NO
CUST_MARITAL_STATUS CATEGORICAL ACTIVE NO
EDUCATION CATEGORICAL ACTIVE NO
HOUSEHOLD_SIZE CATEGORICAL ACTIVE NO
OCCUPATION CATEGORICAL ACTIVE NO
YRS_RESIDENCE NUMERICAL ACTIVE NO
Y_BOX_GAMES NUMERICAL ACTIVE NO
AFFINITY_CARD CATEGORICAL ACTIVE YES
The first thing to note here is that all the attributes are listed as ACTIVE. This is the default and will be the case for all attributes for all the algorithms, so we can ignore this attribute in our queries, but it is good to check just in case.
The second thing to note is for the last row we have the AFFINITY_CARD has a target attribute value of YES. This is the target attributes used by the classification algorithm.
ALL_MINING_MODEL_SETTINGS
Describes the setting of the data mining models. The settings associated with a model are algorithm dependent. The Setting values can be provided as input to the model build process. Alternatively, separate settings table can used. If no setting values are defined of provided, then the algorithm will use its default settings.
| Attribute | Data Type | Description |
| OWNER | Varchar2(30) NN | Owner of the mining model |
| MODEL_NAME | Varchar2(30) NN | Name of the mining model |
| SETTING_NAME | Varchar2(30) NN | Name of the Setting |
| SETTING_VALUE | Varchar2(4000) | Value of the Setting |
| SETTING_TYPE | Varchar2(7) | Indicates whether the default value (DEFAULT) or a user specified value (INPUT) is used by the model |
Lets take our previous example of the ‘CLAS_DT_1_6’ model and query the database to see what the setting are.
column setting_value format a30
select setting_name,
setting_value,
setting_type
from all_mining_model_settings
where model_name = ‘CLAS_DT_1_6’;
SETTING_NAME SETTING_VALUE SETTING
———————– —————————- ——-
ALGO_NAME ALGO_DECISION_TREE INPUT
PREP_AUTO ON INPUT
TREE_TERM_MINPCT_NODE .05 INPUT
TREE_TERM_MINREC_SPLIT 20 INPUT
TREE_IMPURITY_METRIC TREE_IMPURITY_GINI INPUT
CLAS_COST_TABLE_NAME ODMR$15_42_50_762000JERWZYK INPUT
TREE_TERM_MINPCT_SPLIT .1 INPUT
TREE_TERM_MAX_DEPTH 7 INPUT
TREE_TERM_MINREC_NODE 10 INPUT
ODM 11.2–Data Mining PL/SQL Packages
The Oracle 11.2 database contains 3 PL/SQL packages that allow you to perform all (well almost all) of your data mining functions.
So instead of using the Oracle Data Miner tool you can write some PL/SQL code that will you to do the same things.
Before you can start using these PL/SQL packages you need to ensure that the schema that you are going to use has been setup with the following:
- Create a schema or use and existing one
- Grant the schema all the data mining privileges: see my earlier posting on how to setup an Oracle schema for data mining – Click here and YouTube video
- Grant all necessary privileges to the data that you will be using for data mining
The first PL/SQL package that you will use is the DBMS_DATA_MINING_TRANSFORM. This PL/SQL package allows you to transform the data to make it suitable for data mining. There are a number of functions in this package that allows you to transform the data, but depending on the data you may need to write your own code to perform the transformations. When you apply your data model to the test or the apply data sets, ODM will automatically take the transformation functions defined using this package and apply them to the new data sets.
The second PL/SQL package is DBMS_DATA_MINING. This is the main data mining PL/SQL package. It contains functions to allow you to:
- To create a Model
- Describe the Model
- Exploring and importing of Models
- Computing costs and text metrics for classification Models
- Applying the Model to new data
- Administration of Models, like dropping, renaming, etc
The next (and last) PL/SQL package is DBMS_PREDICTIVE_ANALYTICS.The routines included in this package allows you to prepare data, build a model, score a model and return results of model scoring. The routines include EXPLAIN which ranks attributes in order of influence in explaining a target column. PREDICT which predicts the value of a target attribute based on values in the input. PROFILE which generates rules that describe the cases from the input data.
Over the coming weeks I will have separate blog posts on each of these PL/SQL packages. These will cover the functions that are part of each packages and will include some examples of using the package and functions.
- ← Previous
- 1
- 2
- 3
- 4
- Next →
You must be logged in to post a comment.