Year: 2012

Articles wanted for Oracle Scene–Spring 2013

Posted on

The Call for Articles is now open for the Spring edition of Oracle Scene magazine. This is a publication of the UKOUG.

We are looking for technical articles covering all product offerings from Oracle. 

Typically articles will range from 3 pages to 8 pages (MS Word format). These will convert into 2 to 5 page articles in Oracle Scene.

Check out the Article Formatting Guidelines before submitting.
All pictures and images should be 300dpi.
Include a 100(max) word Bio and your photo
Email your article and images to

articles@ukoug.org.uk

For more details about submitting an article, check out
http://www.ukoug.org/what-we-offer/oracle-scene/article-submissions/

Advertisements

Association Rules in ODM-Part 4

Posted on

This is a the final part of a four part blog post on building and using Association Rules in the Oracle Database using Oracle Data Miner. The following outlines the contents of each post in the series on Association Rules

  1. This first part will focus on how to building an Association Rule model
  2. The second post will be on examining the Association Rules produced by ODM – This blog post
  3. The third post will focus on using the Association Rules on your data.
  4. The final post will look at how you can do some of the above steps using the ODM SQL and PL/SQL functions.

In my previous posts I showed how you can go about setting up for Association Rule analysis in Oracle Data Miner and how to examine the rules that are generated.

This post will focus on how we build and use association rules using the functionality that is available in SQL and PL/SQL.

Step 1 – Build the Settings Table

As with all Oracle Data Mining functions in SQL and PL/SQL you will need to setup or build a settings table. This table contains all the settings necessary to run the model build functions. It is a good idea to create a separate settings table for each model build that you complete.

CREATE TABLE assoc_sample_settings (
setting_name VARCHAR2(30),
setting_value VARCHAR2(4000));

Step 2 – Define the Settings for the Model

Before you go to generate your model you need to set some of the parameters for the algorithm. To start with you need to defined that we are going to generate an Association Rules model, turn off the Automatic Data Preparation.

We can also set 3 additional settings for Association Rules.
image

The ASSO_MIN_SUPPORT has a default of 0.1 or 10%. That means that only rules that exist in 10% or more of the cases will be generated. This is really a figure that is too high. In the code below we will set this to a 1%. This matches the settings that we used in SQL Developer in my previous posts.

BEGIN

INSERT INTO assoc_sample_settings (setting_name, setting_value) VALUES

(dbms_data_mining.algo_name, dbms_data_mining.ALGO_APRIORI_ASSOCIATION_RULES);

INSERT into assoc_sample_settings (setting_name, setting_value) VALUES

(dbms_data_mining.prep_auto, dbms_data_mining.prep_auto_off);

INSERT into assoc_sample_settings (setting_name, setting_value) VALUES

(dbms_data_mining.ODMS_ITEM_ID_COLUMN_NAME, ‘PROD_ID’);

INSERT into assoc_sample_settings (setting_name, setting_value) VALUES

(dbms_data_mining.ASSO_MIN_SUPPORT, 0.01);

COMMIT;

END;

/

Step 3 – Prepare the Data

In our example scenario we are using the SALE data that is part of the SH schema. The CREATE_MODEL function needs to have an attribute (CASE_ID) that identifies the key of the shopping basket. In our case we have two attributes, so we will need to use a combined key. This combined key consists of the CUST_ID and the TIME_ID. This links all the transaction records related to the one shopping event together.

We also just need the attribute that has the information that we need. In our Association Rules (Market Basket Analysis) scenario, we will need to include the PROD_ID attribute. This contains the product key of each product that was included in the basket

CREATE VIEW ASSOC_DATA_V AS (

SELECT RANK() OVER (ORDER BY CUST_ID, TIME_ID) CASE_ID,

t.PROD_ID

FROM SH.SALES t );

Step 4 – Create the Model

We will need to use the DBMS_DATA_MINING.CREATE_MODEL function. This will use the settings in our ASSOC_SAMPLE_SETTINGS table. We will use the view created in Step 3 above and use the CASE_ID attribute we created as the Case ID in the function all.

BEGIN 
   DBMS_DATA_MINING.CREATE_MODEL( 
     model_name          => ‘ASSOC_MODEL_2’, 
     mining_function     => DBMS_DATA_MINING.ASSOCIATION, 
     data_table_name     => ‘ASSOC_DATA_V’, 
     case_id_column_name => ‘CASE_ID’, 
     target_column_name  => null, 
     settings_table_name => ‘assoc_sample_settings’);

END;

On my laptop this took approximately 5 second to run on just over 918K records involving just over 143K cases or baskets.

Now that is quick!!!

Step 5 – View the Model Outputs

There are a couple of functions that can be used to extract the rules produced in our previous step. These include:

GET_ASSOCIATION_RULES : This returns the rules from an association model.

SELECT rule_id, 
       antecedent, 
       consequent, 
       rule_support,

       rule_confidence

FROM TABLE(DBMS_DATA_MINING.GET_ASSOCIATION_RULES(‘assoc_model_2’, 10));

The 10 here returns the top 10 records or rules.image GET_FREQUENT_ITEMSETS : returns a set of rows that represent the frequent item sets from an association model. In the following code we want the top 30 item sets to be returned, but filtered to only display item sets where there are 2 or more rules.

SELECT itemset_id,

       items,

       support,

       number_of_items

FROM TABLE(DBMS_DATA_MINING.GET_FREQUENT_ITEMSETS(‘assoc_model_2’, 30))

WHERE number_of_items >= 2;

image

BIWA Summit–9th & 10th January, 2013

Posted on

The BIWA Summit will be on the 9th and 10th January, 2013. It is being held in the Sofitel Hotel beside the Oracle HQ at Redwood Shores, just outside of San Francisco.

The BIWA Summit looks to be leading event in 2013 focused on Analytics, Data Warehousing, Big Data and BI. If you are a data architect or a data scientist this is certainly one event that you should consider attending in 2013.

All the big names (in the Oracle world) will be there Tom Kyte, Mark Rittman, Maria Colgan, Balaji Yelmanchili, Vaishnavi Sashikanth, Charlie Berger, Mark Hornick, Karl Rexter, Tim and Dan Vlamis.

Oh and then there is me. I’ll be giving a presentation on the Oracle Data Scientist. This will be on the first day of the event (9th) at 11:20am.

For anyone interest in the Oracle Data Scientist World there are lots of presentations to help you get start and up to speed in this area. Here is a list of presentations and hands on labs that I can recommend.

image

As is typical with all good conferences there are many presentations on at the same time that I would like to attend. If only I could time travel.

This is a great event to start off the new year and for everyone who is thinking of moving into or commencing a project in the area. So get asking you manager to see if there is any training budget left for 2012 or get first dibs on the training budget for 2013.

Registration is open and at the moment the early bird discount still seems to be available. You can also book a room in the hotel using the registration page.

To view the full agenda – click here

Tom Kyte in Dublin 21st January 2013

Posted on

Tom Kyte will be back in Dublin on the 21st January, 2013.  He will be giving a number of presentations covering some of his popular Oracle Open World sessions and will also include a AskTom session

It will be a full day, kicking off at 9am and finishing around 3:30pm.

There is no better way to kick off the new year with a full day of FREE Oracle training and up skilling with Tom Kyte.

To register for the event send an email to marketing-ie_ie@oracle.com

As they say places are limited, so book early,  I have Smile so I’ll see you there.

OUG Ireland 2013–Call for Presentations

Posted on

The call for presentations at the OUG Ireland Conference is now open. The conference will be on Tuesday 12th March in Dublin city centre.

It is hoped to have at a number of concurrent tracks covering all the main topic areas, including application development, database administration, business intelligence, applications, etc.

If you are interested in submitting a presentation then you need to fill in some of the detail at

OUG Ireland – Submit a Paper

Follow the OUG Ireland conversation on twitter using the tag  #oug_ire

call for papers

Association Rules in ODM-Part 3

Posted on

This is a the third part of a four part blog post on building and using Association Rules in Oracle Data Miner. The following outlines the contents of each post in the series on Association Rules

  1. This first part will focus on how to building an Association Rule model
  2. The second post will be on examining the Association Rules produced by ODM – This blog post
  3. The third post will focus on using the Association Rules on your data.
  4. The final post will look at how you can do some of the above steps using the ODM SQL and PL/SQL functions.

In my previous posts I showed how you can go about setting up for Association Rule analysis in Oracle Data Miner and how to examine the rules that are generated.

This post will focus on how we can extract and use these rules in Oracle Data Miner.

Step 1 – Model Details

Association Rules are an unsupervised method of data mining. In Oracle Data Miner we cannot use the Apply node to to score new data. What we have to do is to generate the Model Details. These in turn can then be used.

The Model Details node is used when we do unsupervised learning to extract the rules that are generated.

To do this we need to click on the Model Details node in the Models section of the Component Palette and then click on our workspace, just to the right of the Association Rule node.

The Edit Model Selection window will open. Connect the Association Rule node to the Model Details node. Then Run the node. This will then generate the Association Rules in a format what we can reuse.

image

When you get the small green tick on the Model Details node you can then view what was generated.

Right click on the Model Details node and click on View Details from the menu.

image

The output is similar to what we would have seen under the Association Rule node with the addition of a few more attributes that include the schema name and model name.

We can order the rules based on the Confidence level by double clicking on the Confidence column header. You might need to do this twice to get the rule appearing based on a descending confidence value.

At this point we can no look at persisting the Association Rules. See step 2 below.

We can also view the SQL that was used to generate the Association Rules that we see in the Model Details node. While still viewing the rules, click on the SQL tab.

image

Step 2 – Persisting the Association Rules

To make the rules persist and be useable outside of ODM we can persist the Association Rules in a table. The first step to do this is to create a new Table Node. This can be found under the Data section of the Component Palette. Click this Create Table or View node in the component palette and then click on the workspace, just to the right of the Model Details node.

Connect the Model Details node to the Output node, by right clicking on the Model Details node, select Connect from the menu and then click on the Output Node.

We can now edit the format of the Output i.e. specify what attributes are to be in our Output table. Double click on the Output node or right click and select Edit from the menu. We now get the Edit Create Table or View Node.

SNAGHTML18801036

We can give the output a meaningful name e.g. AR_OUTPUT_RULES. We can also specify what rule properties we can to export to attributes in out table.

We will need to un-tick the Auto Input Columns Selection tick box before we can remove any of the output attributes. In my case I only want to have ANTECENDENT_ITEMS, CONSEQUENT_ITEMS, ID, LENGTH, CONFIDENCE and SUPPORT in my out put. So I need to select and highlight all the other attributes (holding the control button). After selecting all the attributes I do not want included in the final output table, I need to click on the red X icon.

SNAGHTML18859128

When complete click on the OK button to go back to the workflow.

To generate the table right click on the AR_OUTPUT_RULES node and select Run from the menu. When you get the green tick mark on the AR_OUTPUT_RULES node the table has been created with records containing the details of each rules.

image

To view the contents of the AR_OUTPUT_RULES table we can right click on this node and select view data from the menu.

image

We can now use these rules in our applications.

 

Check out the next post in the series (Part 4) where we will look at the functionality available in the ODM SQL & PL/SQL functions to perform Association Rule analysis.

Association Rules in ODM–Part 2

Posted on

This is a the second part of a four part blog post on building and using Association Rules in Oracle Data Miner.  The following outlines the contents of each post in the series on Association Rules

  1. This first part will focus on how to building an Association Rule model
  2. The second post will be on examining the Association Rules produced by ODM – This blog post
  3. The third post will focus on using the Association Rules on your data.
  4. The final post will look at how you can do some of the above steps using the ODM SQL and PL/SQL functions. 

In the previous post I looked at the steps needed to setup a data source and to setup the Association Rule node. When everything was setup we ran the workflow.

Step 1 – Viewing the Model

We the workflow has finished running we will have the green tick marks on each node. This is where we left thing at the end of the previous post (Part 1). To view the model details, right click on the Association Role Node and select View Models from the menu.

image

There are 3 main concepts that are important in relation to Association Rules:

  • Support: is the proportion of transactions in the data set that contain the item set i.e. the number of times the rule occurs
  • Confidence: is the proportion of the occurrences of the antecedent that result in the consequent e.g. how many times do we get C when we have A and B  {A, B} => C
  • Lift: indicates the strength of a rule over the random co-occurrence of the antecedent and the consequent

Support and Confidence are the primary measures that are used to access the usefulness of an association rule.

In our example we can see that the the antecedent and the consequent has numbers separated by the word AND. These numbers correspond to the product numbers.

Step 2 – Examining the Model Rules

To read the antecedent and the consequent for the first rule in our example we have:

Antecedent: 137 AND 143 AND 128

Consequent: 144

To read this association rule we would say that if a Customer bought product 137 and product 143 and product 128, then we have a Confidence value of almost 71%. This is a strong association.

We can check the ordering of the rules by changing the Sort By criteria. As Confidence and Support are the main ways to evaluate the rules, we can change the Sort By criteria to be Confidence. Then click on the Query button to refresh the rules section.

image

Here get a list of the strongest rules listed in descending order.

Below the section of the screen that has the Rules, we have the Rule Details section.

image 

Here we can see that the rule gets formatted into an IF statement. The first rule in the list has a confidence of almost 97%. As it is a simple IF statement it can be easily implemented in our applications.

We want use the information that these rules provides in a number of ways. One such consequence of these rules is that we can look at improving the ordering and distribution of these products to ensure that we have sufficient numbers of each. Another consequence is that we can enhance the front end selling mechanism to make sure that if a customer is buying product 114, 118 and 115 then we can remind the customer of product 119. We can also ensure that all these products are not located beside each other, so that the customer will have to walk past many other products in order to find them. That is why we never see milk and bread beside each other in a grocery store.

Step 3 – Applying Filters to the Model Rules

In the previous step we were able to sort our rules based on some of the measures of our Association Rules and to see how these rules are structured.

Association Rule Analysis can generate many thousands of possible rules for a small data set. In some cases the similar rules can appear and we can have lots of rules that occur so infrequently that they are perhaps meaningless.

ODM provides us with a number of filters that we can apply to the rules that enables use to look for the rules that are of must interest to use. We can access these filters by clicking on the More button, that is located just under the Query button.

We can refine our query on the rules based on the various measures and the number if items in the rule. In addition to this we can also filter based on the values of the items. This is particularly useful if we want to concentrate on specific items (in our example Products). To illustrate this use focus on the rules that involve Product 115. Click on the green + symbol on the right hand side of the window. Select 115 from the list provided. Next we need to decide if we want Product 115 involved in the Antecedent or the Consequent. In our example select the Consequent. This is located to the bottom right of the window. Then click the OK button and then click on the Query button to update the list of rules that correspond with the new filter.

image

We can see that we only have rules that have Product 115 in the Consequent column.

We can also see that we have 134 rules for this scenarios out of a total of 20,988 (your results might differ slightly to mine and that’s OK. It really depends on what version of the sample data you are using)

 

Check out the next post in the series (Part 3) where we will look at how you can use the Association Rules produced by ODM.