OTN « Ora-lytics

Oracle Magazine-September/October 1999

Posted on April 25, 2013

The headline articles in the September/October 1999 edition of Oracle Magazine focused on how the Oracle technology can be used to educate staff and to keep their skills up to date. either on site or remote via on-demand training resources.

Other articles included:

Oracle announce that they have acquired Thinking Machine’s data mining business. This data mining product was called Darwin and is now called Oracle Data Mining. I will have a separate blog post for this announcement.
Oracle 8i Lite has shipped and comes with three component: Oracle Lite a single user (50K to 750K foot print), Web-to-Go allows users to access the same data and web applications both online and offline, iConnect that was a flexible architecture that enables reliable and scalable bi-directional synchronization of data and applications. Oracle 8i Lite was supported on MS Windows 95, 98 and NT, Windows CE, Palm OS and EPOC 32.
Oracle XML Parser for C and Oracle XML Parser for C++ are released and supports DOM and Simple API for XML (SAX) interfaces.
Oracle XML SQL utilities and XSQL Servlet facilitates the reading and writing of XML information from and to the Oracle database.
Siemens announce that they plan to build an Oracle 8i Applicance on its Primergy line of servers, based on Intel Pentium II Xeon processors.
Singapore Telecom’s Magix Server delivers the World’s first nationwide video on demand service. Their 12,000 subscriber were able to use a web-browser to select a video from the Megix Web side and SingTel automates the streaming of them to their computer.
Oracle 8i comes with some improvements in PL/SQL. These included Autonomous Transactions, Native Dynamic SQL, Invoker rights procedures, user-defined operators, new operators, bulk binds.
Part 2 of the article on exporting an Oracle Database to a Flat File. In this part of the article it looks at how you can use the UTL_FILE package.
How you can speed up query response times by using a Materialized Views. The article suggests the following steps to analyze the performance impact:
- Configure the server parameters
- Grant privileges to the appropriate schema
- Create a materialized view
- Refresh the optimizer statistics
- Confirm that the materialized view is being used
- Manually refresh a materialized view
Oracle introduces Oracle Log Miner to allow a DBA to analyze the REDO log files

This entry was posted in Oracle, Oracle Magazine, OTN.

Part 1–Getting started with Statistics for Oracle Data Science projects

Posted on April 11, 2013

With all analytics or data science projects one of the first steps typically involves the extraction of data from various sources, merging the data and then performing various statistics.

The extraction and merging of the data is well covered by lots of other people blogging about how to use Oracle Data Integration (ODI), Oracle Warehouse Builder (OWB), among many others.

What I’m going to look at in this series of blog posts will be what statistical functions you might look at using in the Oracle and how to use them.

This the first blog post in the series will look at the DBMS_STAT_FUNCS PL/SQL package, what it can be used for and I give some sample code on how to use it in your data science projects. I also give some sample code that I typically run to gather some additional stats.
The second blog post will look at some of the other statistical functions that exist in SQL that you will/may use regularly in your data science projects.
The third blog post will provide a summary of the other statistical functions that exist in the database.

These statistical functions can be grouped into 2 main types. The first is the descriptive statistics that are available by using the DBMS_STAT_FUNCS PL/SQL package and then there is the extensive list of other SQL stats functions. It is worth mentioning at this point that all these stats packages and functions come as standard in the database (i.e. they are FREE, you do not have to pay for an add on option for the database to use them). So once you have you Oracle database installed you can start using them. There is no need to spend money buying another stats package to do stats. All you need to know is some SQL and what the stats functions are.

DBMS_STAT_FUNCS

One of the stats package that I use a lot is the SUMMARY function. This is part of the DBMS_STAT_FUNC PL/SQL package. This package calculates a number of common statistics for an attribute in a table. Yes that’s correct, it only gather statistics for just one attribute. So you will have to run it for all the numeric attributes in the table.

For does people who are familiar with the Oracle Data Miner tool, the explore data node produces a lot of these statistics that the SUMMARY function produces. See below for details of how to produce the Histograms.

The SUMMARY function has the following parameters

Although you will probably be running this this function on the data in your schema you still have to give the schema name. The table name is the name of the table where the data exists, the column name is the name of the column that contains the actual data you want to analyse, and the ‘s’ is the record that will be returned by the SUMMARY function that contains all the summary information.

An example of the basic script to run the SUMMARY function is given below. It will use the data that is available in the sample schemas and the views that where setup for the Oracle Data Mining sample schemas. The table (or in this case the view) that we are going to use is the MINING_DATA_BUILD_V. What we are going to do is to replicate some of what the Explore Node does in the Oracle Data Miner tool.

set serveroutput on

declare
s DBMS_STAT_FUNCS.SummaryType;
begin

DBMS_STAT_FUNCS.SUMMARY(‘DMUSER’, ‘MINING_DATA_BUILD_V’, ‘AGE’, 3, s);

   dbms_output.put_line(‘SUMMARY STATISTICS’);
   dbms_output.put_line(‘Count : ‘||s.count);
   dbms_output.put_line(‘Min    : ‘||s.min);
   dbms_output.put_line(‘Max    : ‘||s.max);
   dbms_output.put_line(‘Range : ‘||s.range);
   dbms_output.put_line(‘Mean   : ‘||round(s.mean));
   dbms_output.put_line(‘Mode Count : ‘||s.cmode.count);
   dbms_output.put_line(‘Mode        : ‘||s.cmode(1));
   dbms_output.put_line(‘Variance    : ‘||round(s.variance));
   dbms_output.put_line(‘Stddev      : ‘||round(s.stddev));
   dbms_output.put_line(‘Quantile 5 : ‘||s.quantile_5);
   dbms_output.put_line(‘Quantile 25 : ‘||s.quantile_25);
   dbms_output.put_line(‘Median      : ‘||s.median);
   dbms_output.put_line(‘Quantile 75 : ‘||s.quantile_75);
   dbms_output.put_line(‘Quantile 95 : ‘||s.quantile_95);
   dbms_output.put_line(‘Extreme Count : ‘||s.extreme_values.count);
   dbms_output.put_line(‘Extremes      : ‘||s.extreme_values(1));
   dbms_output.put_line(‘Top 5 : ‘||s.top_5_values(1)||’,’||
                                                s.top_5_values(2)||’,’||
                                                s.top_5_values(3)||’,’||
                                                s.top_5_values(4)||’,’||
                                                s.top_5_values(5));
   dbms_output.put_line(‘Bottom 5 : ‘||s.bottom_5_values(5)||’,’||
                                                     s.bottom_5_values(4)||’,’||
                                                     s.bottom_5_values(3)||’,’||
                                                     s.bottom_5_values(2)||’,’||
                                                     s.bottom_5_values(1));
end;
/

We can compare this to what is produced by the Explore Node in ODM

We can see that the Explore Node gives us more statistics to help us with understanding the data.

What Statistics does the Explore Node produce

We can see the actual SQL code that the Explore Node runs to get the statistics that are displayed in the Explore Node View Data window. To do this you will need to right-click on the Explore Node and move the mouse down to the Deploy option. The submenu will open and select ‘SQL to Clipboard’ from the list. Open a text editor and past the code. You will need to tidy up some of this code to point it at the actual data source you want. You will get the following

SELECT /*+ inline */ ATTR,
       DATA_TYPE,
       NULL_PERCENT,
       DISTINCT_CNT,
       DISTINCT_PERCENT,
       MODE_VALUE,
       AVG,
       MIN,
       MAX,
       STD,
       VAR,
       SKEWNESS,
       KURTOSIS,
       HISTOGRAMS
FROM OUTPUT_1_23;

Where OUTPUT_1_23 is a working table that ODM has created to store intermediate results from some of its processing. In this case the Explore Node. You will need to change this to the ODM working table in your schema.

This query does not perform any of the statistics gathering. It just presents the results.

Creating our own Statistics gathering script – Part 1

The attribute names in the above SQL query tells us what statistics functions within Oracle that were used. We can replicate this statistics gathering task using the following script. There are four parts to this script. The first part gathers most of the common statistics for the attribute. The second and third parts calculate the Skewness and Kurtosis for the attribute. The final (fourth) part combines the first three parts and lists the outputs.

The one statistic function that we are not including at this point is the Histogram information. I will cover this in the next (second) blog post on statistics.

The following script has the data source table/view name included (MINING_DATA_BUILD_V) and the attribute we are going to use (AGE). You will need to modify this script to run it for each attribute.

WITH
    basic_statistics AS (select (sum(CASE WHEN age IS NULL THEN 1 ELSE 0 END)/COUNT(*))*100 null_percent,
          count(*)    num_value,
          count(distinct age)   distinct_count,
          (count(distinct age)/count(*))*100     distinct_percent,
          avg(age)      avg_value,
          min(age)      min_value,
          max(age)     max_value,
          stddev(age) std_value,
          stats_mode(age)   mode_value,
          variance(age)       var_value
        from   mining_data_build_v),
    skewness AS (select avg(SV) S_value
                 from (select power((age – avg(age) over ())/stddev(age) over (), 3) SV
                       from mining_data_build_v) ),
    kurtosis AS (select avg(KV) K_value
                 from (select power((age – avg(age) over ())/stddev(age) over (), 4) KV
                       from mining_data_build_v) )
SELECT null_percent,
       num_value,
       distinct_percent,
       avg_value,
       min_value,
       max_value,
       std_value,
       mode_value,
       var_value,
       S_value,
       K_value
from basic_statistics,
     skewness,
     kurtosis;

Part 2 – Lets do it for all the attributes in a table

In the code above I’ve shown how you can gather the statistics for one particular attribute of one table.But in with an analytics project you will want to gather the statistics on all the attributes.

What we can do is to take the code above and put it into a procedure. This procedure accepts a table name as input, loops through the attributes for this table and calculates the various statistics. The statistics are saved in a table called DATA_STATS (see below).

drop table data_stats;

create table DATA_STATS (
table_name VARCHAR2(30) NOT NULL,
column_name VARCHAR2(30) NOT NULL,
data_type VARCHAR2(106) NOT NULL,
data_length NUMBER,
data_percision NUMBER,
data_scale NUMBER,
num_records NUMBER,
distinct_count NUMBER,
null_percent NUMBER,
distinct_percent NUMBER,
avg_value NUMBER,
min_value NUMBER,
max_value NUMBER,
std_value NUMBER,
mode_value VARCHAR2(1000),
var_value NUMBER,
s_value NUMBER,
k_value NUMBER,
PRIMARY KEY (table_name, column_name));

This is one of the first things that I do when I start on a new project. I create the DATA_STATS table and run my procedure GATHER_DATA_STATS for each table that we will be using. By doing this it allows me to have a permanent records of the stats for each attribute and saves me time in having to rerun various stats at different points of the project. I can also use these stats to produces some additional stats or to produce some graphs.

He is the code for the GATHER_DATA_STATS procedure.

CREATE OR REPLACE PROCEDURE gather_data_stats(p_table_name IN varchar2) AS

   cursor c_attributes (c_table_name varchar2)
                       is SELECT table_name,
                                 column_name,
                                 data_type,
                                 data_length,
                                 data_precision,
                                 data_scale
                          FROM user_tab_columns
                          WHERE table_name = upper(c_table_name);

   v_sql     NUMBER;
   v_rows    NUMBER;
BEGIN
   dbms_output.put_line(‘Starting to gather statistics for ‘||upper(p_table_name)||’ at ‘||to_char(sysdate,’DD-MON-YY HH24:MI:SS’));

   FOR r_att in c_attributes(p_table_name) LOOP
      —
      — remove any previously generated stats
      —
      v_sql := DBMS_SQL.OPEN_CURSOR;
      DBMS_SQL.PARSE(v_sql, ‘delete from DATA_STATS where table_name = ”’||r_att.table_name||”’ and column_name = ”’||r_att.column_name||””, DBMS_SQL.NATIVE);
      v_rows := DBMS_SQL.EXECUTE(v_sql);
–dbms_output.put_line(‘delete from DATA_STATS where table_name = ”’||r_att.table_name||”’ and column_name = ”’||r_att.column_name||””);

IF r_att.data_type = ‘NUMBER’ THEN
dbms_output.put_line(r_att.table_name||’ : ‘||r_att.column_name||’ : ‘||r_att.data_type);

         —
         — setup the insert statement and execute
         —
         v_sql := DBMS_SQL.OPEN_CURSOR;
         DBMS_SQL.PARSE(v_sql, ‘insert into data_stats select ”’||r_att.table_name||”’, ”’||r_att.column_name||”’, ”’||r_att.data_type||”’, ‘||r_att.data_length||’, ‘||nvl(r_att.data_precision,0)||’, ‘||nvl(r_att.data_scale,0)||’, count(*) num_value, (sum(CASE WHEN ‘||r_att.column_name||’ IS NULL THEN 1 ELSE 0 END)/COUNT(*))*100 null_percent, count(distinct ‘||r_att.column_name||’) distinct_count, (count(distinct ‘||r_att.column_name||’)/count(*))*100 distinct_percent, avg(‘||r_att.column_name||’) avg_value, min(‘||r_att.column_name||’) min_value, max(‘||r_att.column_name||’) max_value, stddev(‘||r_att.column_name||’) std_value, stats_mode(‘||r_att.column_name||’) mode_value, variance(‘||r_att.column_name||’) var_value, null, null from ‘|| r_att.table_name, DBMS_SQL.NATIVE);
         v_rows := DBMS_SQL.EXECUTE(v_sql);

ELSIF r_att.data_type IN (‘CHAR’, ‘VARCHAR’, ‘VARCHAR2’) THEN
dbms_output.put_line(r_att.table_name||’ : ‘||r_att.column_name||’ : ‘||r_att.data_type);

         —
         — We need to gather a smaller number of stats for the character attributes
         —
         v_sql := DBMS_SQL.OPEN_CURSOR;
         begin

DBMS_SQL.PARSE(v_sql, ‘insert into data_stats select ”’||r_att.table_name||”’, ”’||r_att.column_name||”’, ”’||r_att.data_type||”’, ‘||r_att.data_length||’, ‘||nvl(r_att.data_precision,0)||’, ‘||nvl(r_att.data_scale,0)||’, count(*) num_value, (sum(CASE WHEN ‘||r_att.column_name||’ IS NULL THEN 1 ELSE 0 END)/COUNT(*))*100 null_percent, count(distinct ‘||r_att.column_name||’) distinct_count, (count(distinct ‘||r_att.column_name||’)/count(*))*100 distinct_percent, null, null, null, null, stats_mode(‘||r_att.column_name||’) mode_value, null, null, null from ‘|| r_att.table_name, DBMS_SQL.NATIVE);
v_rows := DBMS_SQL.EXECUTE(v_sql);

— dbms_output.put_line(‘insert into data_stats select ”’||r_att.table_name||”’, ”’||r_att.column_name||”’, ”’||r_att.data_type||”’, ‘||r_att.data_length||’, ‘||nvl(r_att.data_precision,0)||’, ‘||nvl(r_att.data_scale,0)||’, count(*) num_value, (sum(CASE WHEN ‘||r_att.column_name||’ IS NULL THEN 1 ELSE 0 END)/COUNT(*))*100 null_percent, count(distinct ‘||r_att.column_name||’) distinct_count, (count(distinct ‘||r_att.column_name||’)/count(*))*100 distinct_percent, null, null, null, null, stats_mode(‘||r_att.column_name||’) mode_value, null, null, null from ‘|| r_att.table_name);
         exception
         when others then
            dbms_output.put_line(v_rows);
         end;

      ELSE
         dbms_output.put_line(‘Unable to gather statistics for ‘||r_att.column_name||’ with data type of ‘||r_att.data_type||’.’);
      END IF;
   END LOOP;

   dbms_output.put_line(‘Finished gathering statistics for ‘||upper(p_table_name)||’ at ‘||to_char(sysdate,’DD-MON-YY HH24:MI:SS’));

   commit;
END;

Then to run it for a table:

exec gather_data_stats(‘mining_data_build_v’);

We can view the contents of the DATA_STATS table by executing the following in SQL*Plus or SQL Developer

select * from DATA_STATS;

This entry was posted in Data Scientist, Oracle Advanced Analytics, Oracle Analytics Option, Oracle Data Miner, OTN, PL/SQL.

OTN has links to two of my blog posts

Posted on April 2, 2013

Over the past couple of weeks I’ve noticed that I had a bit of a spike in my blog stats (I don’t check them often). In particular there was 2 groups of blog posts that were getting a lot of the hit.

After a bit of investigation I found out that it was do to referrals from one particular website. It was OTN or Oracle Technology Network, and more specifically it was from their webpage dedicated for Database Admins and Developer.

Yes OTN had links to my blog posts on Clustering in Oracle Data Miner and to my blog post on Are you a Type I and Type II Data Scientists.

What a surprise this was to discover!!! and what a honour Smile

I don’t know how long they will be on the OTN webpage, but hopefully lots of people in the Oracle community will find them useful.

I’m working on my next set of Oracle Data Miner blog posts, so watch this space. Plus I’ve started work on two technical articles that I’ll be submitting to OTN over the next few weeks. So hopefully you will see these up on OTN soon.

This entry was posted in Brendan Tierney, Oracle Technology Network, OTN.

Review of Oracle Magazine–May/June 1999

Posted on March 25, 2013

The headline articles for the May/June 1999 edition of Oracle Magazine were focused on using internet technologies to allow businesses to work together more efficiently.

Other articles included:

Oracle and Hewlett-Packard announce the a prebuild Oralce 8i appliance. This had a code-name of Raw Iron.
Oracle announce a $100million venture fund to promote innovation by companies developing products and services based on Oracle 8i
Oracle 8i comes with the an enhanced feature that automatically keeps a standby database synchronized with the production database. This is called the Automated Standby Databases (ASD) and hopes to reduce the amount of manual work DBAs need to perform.
Some helpful suggestions on how to go about implementing parallel DML in Oracle 8.

Rules for Parallel Insert

The insert statement must be of the form ‘insert into table_name select …’
The table being modified must have a specified parallel declaration or you must specify a parallel hint directive in the insert statement
You can perform parallel insert on non-partitioned as well as partitioned tables
After the parallel DML is complete no other SQL statements can access the same table until a Commit is issued.

Rules for Parallel Update and Delete

Table table must have a parallel declaration specified or you must specify a parallel hint directive in the update/delete statement
You can perform parallel update or delete on partitioned tables only
You cannot see the result of the parallel update or delete during the transaction

By using the parallel options, data intensive SQL statements, database recovery, and data loads can be executed by multiple processes simultaneously. All the following operations can be executed in parallel

table scan
sort merge join
Not In
select distinct
aggregation
cube
create table as select
rebuild index partition
move partition
update
Insert ….. select
Enable constraint
PL/SQL functions called from SQL
Nested loop join
Hash join
Group by
Union and union all
Order by
Rollup
Create index
Rebuild index
Split partition
Delete
Star transformation

To view the cover page and the table of contents click on the image at the top of this post or click here.

My Oracle Magazine Collection can be found here. You will find links to my blog posts on previous editions and a PDF for the very first Oracle Magazine from June 1987.

This entry was posted in Oracle, Oracle Magazine, OTN.

Oracle Magazine–March/April 1999

Posted on March 11, 2013

The headline articles for the March/April 1999 edition of Oracle Magazine were on the evolving world of the DBA. With some much new technology available in the database the role of the DBA is moving from a back office type role to one having a significant strategic influence in the organisation.

Other articles included:

Oracle releases a web based version of their Oracle Strategic Procurement application that includes three key parts: Strategic Sourcing, Internet Procurement and Process Automation.
Sun and Oracle announce a strategic agreement that allows both companies to enhance their product offerings by exchanging key technologies. Oracle will use the core of the Sun Solaris operating environment to deliver the industry’s first database server appliances.
Oracle Data Mart Suite releases version 2.5. It includes, Oracle Data Mart Builder, Oracle Data Mart Designer, Oracle 8 Enterprise Edition, Oracle Discoverer, Oracle Application Server and Oracle Reports and Reports Server.
New integration between Oracle Reports release 6.0 and Oracle Express Server release 6.2 to give users the ability to distribute high quality reports of information held in a multi-dimensional database across the enterprise.
The need for the DBA to know and understand the V$ views has been increasing during the later releases of 7.3 and 8i. The can be used for a variety of purposes, including understanding locked users, system resources, licencing and parameter settings.
One thing that all DBAs need to plan for is a database recovery. Planning it is one thing, but practicing it is another thing. A typical recovery plan will include, choosing a data file, create a backup, take the damaged tablespace offline, restore the damaged data file, bring the tablespace back online, recover the tablespace, bring the tablespace back online and test it.
Avoiding trigger errors, including Mutating and constraining table errors.
There is an article by Bryan Laplante on using Historgrams to Optimize Data Mart Performance.

To view the cover page and the table of contents click on the image at the top of this post or click here.
My Oracle Magazine Collection can be found here. You will find links to my blog posts on previous editions and a PDF for the very first Oracle Magazine from June 1987.

This entry was posted in Brendan Tierney, Oracle, Oracle Magazine, Oracle Technology Network, OTN, oug_ire.

Clustering in Oracle Data Miner–Part 4

Posted on March 5, 2013

This is a the fourth part of a five (5) part blog post on building and using Clustering in Oracle Data Miner. The following outlines the contents of each post in this series on Clustering.

The first part we looked at what clustering features exist in ODM and how to setup the data that we will be using in the examples
The second part will focus on how to building Clusters in ODM .
The third post will focus on examining the clusters produced by ODM and how to use the Clusters to apply to new data using ODM.
The fourth post will look at how you can build and evaluate a Clustering model using the ODM SQL and PL/SQL functions.
The fifth and final post will look at how you can apply your Clustering model to new data using the ODM SQL and PL/SQL functions.

With Clustering we are trying to find hidden patterns in our data. Unlike classification we a not directing the algorithms on what areas/attributes to focus on.
In my preceding post on Clustering in Oracle Data Miner I showed how you can use the Oracle Data Miner (ODM) tool, that is part of SQL Developer, to build a cluster model and to apply it to new data.
As I have shown in previous blog posts we can do all of the same steps using SQL and PL/SQL. In this blog post I will show you how to to these SQL and PL/SQL functions to generate a Clustering model and how you can evaluate the model produced.
Step 1 – What Cluster Models to we already have
Oracle Data Miner comes with a number of data dictionary tables. We can use these to see what already exists in our schema. In the example below we will have a look to see the Cluster models that were produced in the Part 2 blog post.
format model_name format a20
format mining_function format a20
format algorithm format a20
SELECT model_name,
       mining_function,
       algorithm,
       build_duration,
       model_size
FROM ALL_MINING_MODELS
WHERE mining_function = ‘CLUSTERING’;

We can also look at the model settings that ODM produced. We can look at the settings of the K-Means model. This model (in my case) is called CLUS_KM_1_25.
column setting_value format a20
select setting_name,
         setting_value,
         setting_type
from all_mining_model_settings
where model_name = ‘CLUS_KM_1_25’

We can also look to see what attributes are used in the K-Mean model.
SELECT attribute_name,
       attribute_type,
       usage_type,
       target
from all_mining_model_attributes
where model_name = ‘CLUS_KM_1_25’

I will show you how to use this Clustering model or the one that we are about to generate in my next blog post.
Step 2 – Creating the Setting table
When creating an Oracle Data Mining model in SQL you need to create Setting table. This will contain a record for each setting they you need to set for the algorithm you want to run. The settings table has a very basic structure and only consists of two columns. It is a good idea to create a separate Settings table for each algorithm or type of data mining you want to do. To create the settings table we run the following:
CREATE TABLE CLUSTER_SETTINGS (
Setting_Name VARCHAR2(30),
Setting_Value VARCHAR2(4000));
The next step is to define the setting we want to use. The first of these will be the actual algorithm you want to run. Oracle has two Clustering algorithms: algo_kmeans and algo_o_cluster. In the examples we are working through we are using the K-Mean algorithm and we want to set the number of clusters to 10. We are also going to set the Automatic Data Preparation (ADP) on. Here is the code.
BEGIN
   INSERT INTO CLUSTER_SETTINGS (setting_name, setting_value)
   VALUES (dbms_data_mining.algo_name, dbms_data_mining.ALGO_KMEANS);
   INSERT INTO CLUSTER_SETTINGS (setting_name, setting_value)
   VALUES (dbms_data_mining.prep_auto, dbms_data_mining.PREP_AUTO_ON);
   INSERT INTO CLUSTER_SETTINGS (setting_name, setting_value)
   VALUES (dbms_data_mining.clus_num_clusters, 10);

   COMMIT;
END;
We can check to see that these records were inserted.
column setting_value format a25
select * from CLUSTER_SETTINGS;

The other settings will be dependent on what clustering algorithm is being used. Each one has their own setting. If you do not define any additional setting Oracle will use the in-build default settings.
To see what the default settings:
column setting_value format a35
select *
from table(dbms_data_mining.get_default_settings)
where setting_name like ‘KM%’;

If you want to use a different value to the default, just create a new record in the CLUSTER_SETTINGS table with the new value.

Step 3 – Create the new Cluster Model
We have the algorithm settings ready, so the next steps is to create the Cluster model. We will be using the DBMS_DATA_MINING.CREATE_MODEL function and pointing it to our CLUSTER_SETTINGS table.
BEGIN
   DBMS_DATA_MINING.CREATE_MODEL(
      model_name          => ‘CLUSTER_KMEANS_MODEL’,
      mining_function     => dbms_data_mining.clustering,
      data_table_name     => ‘INSURANCE_CUST_LTV’,
      case_id_column_name => ‘CUSTOMER_ID’,
      target_column_name => null,
      settings_table_name => ‘CLUSTER_SETTINGS’);
END;
This takes a few seconds to run on my laptop. When finished we can uses queries like we used in Step 1 above to find out the details of what was produced.
Step 4 – Examining the Cluster model
The first thing that we will need to check is that the model was created.
format model_name format a20
format mining_function format a15
format algorithm format a15
SELECT model_name,
       mining_function,
       algorithm,
       build_duration,
       model_size
FROM ALL_MINING_MODELS
WHERE mining_function = ‘CLUSTERING’;

One thing that is a bit curious is the mode size. We have basically created two K-means models using the same settings, but the final models have different sizes. See below for an explanation.
Now lets look at the algorithms settings for our new Cluster model (Cluster KMeans Model)
select setting_name,
       setting_value,
       setting_type
from all_mining_model_settings
where model_name = ‘CLUSTER_KMEANS_MODEL’

We can also look at the attributes used in the clusters.
SELECT attribute_name,
       attribute_type,
       usage_type,
       target
from all_mining_model_attributes
where model_name = ‘CLUSTER_KMEANS_MODEL’

If we compare this list back to the list of attributes that were part of the model that we developed in the ODM tool, you will see that we have some extra attributes listed for the new K-means model.
The reason for these extra attributes and the bigger size (although still small) will be due to us having applied a sampling in the ODM tool. We took a subset of the data in the ODM tool and built the model based on this data. For the model we just created in SQL we took all the data (no sampling). That is why it would have looked at including more attributes because it was looking at a much bigger data set, etc.
Step 5 – Examining the Cluster Details
In this step we are going to look at some of the clusters produced by our new Cluster model.
We can produce the information relating to the clusters that were produced and what level in the hierarchy each cluster belongs. We can use this information to create/draw our own cluster hierarchy diagram.
set lines 90
column child format a40
select id,
       record_count,
       parent,
       tree_level,
       child
from table(dbms_data_mining.get_model_details_km(‘CLUSTER_KMEANS_MODEL’))

To look at the Centroid details for a particular cluster e.g. Cluster ID = 7, we can run:
column attribute_name format a25
column mode_value format a25
select t.id,
       c.attribute_name,
       c.mean,
       c.mode_value,
       c.variance
from table (dbms_data_mining.get_model_details_KM(‘CLUSTER_KMEANS_MODEL’)) t,
     table(t.centroid) c
where t.id = 7
order by c.attribute_name
The results will contain details of the centroid for cluster number 7. The centroid details will include the attributes that were used to form the centroid and the values for each attribute.

The last thing we can look at is the rule for a cluster. The following query will give you the full details of the cluster rule for Cluster 7. This will give for each attribute what the values or ranges of values that is used for that cluster. I’ll let you run this as it returns a large number of records. Run it and have a look at some of the output.
select t.id,
       a.attribute_name,
       a.conditional_operator,
       nvl(a.attribute_str_value,
       a.attribute_num_value) AS value,
       a.attribute_support,
       a.attribute_confidence
from TABLE(dbms_data_mining.get_model_details_km(‘CLUSTER_KMEANS_MODEL’)) t,
     TABLE(t.rule.antecedent) a
where t.id = 7
ORDER BY t.id, a.attribute_name, attribute_support, attribute_confidence desc, value;

My next blog post on Clustering in Oracle, will be the final one in the series. It will show you how to apply a Clustering model to new data in your database.

This entry was posted in ODM 11g R2, Oracle Advanced Analytics, Oracle Analytics Option, Oracle Data Miner, oracle data mining, Oracle Data Mining 11g R2, OTN.

Oracle ACE Director

Posted on February 6, 2013

Towards the end of last week I received and email from Oracle saying that I had been nominated and accepted, by Oracle, to become an Oracle ACE Director.

This is something that makes me very proud and honours the work I have been doing over the past few years on Data Mining in Oracle (Advanced Analytics Option, Data Science/Predictive Analaytics, or whatever you want to call it). Thank you to everyone who nominated me.

If you are not familiar with the Oracle ACE Program, it is a way for Oracle to acknowledge not only technical skills but also personal engagement with the Oracle Community and Technology overall. There is even a FAQ that explains how this program works.

There are a few perks that come with the title, and Oracle have a few expectations too. Most of these expectations I’m already doing!! What I’m looking forward to later this years is my first Oracle ACE Director briefing at Oracle Open World (22-26 September)

This entry was posted in Oracle, oracle open world, OTN.

Oracle Magazine-Nov/Dec. 1998

Posted on January 30, 2013

The headline articles for the Nov/Dec 1998 edition of Oracle Magazine were on building web based applications and thin client computing. A large part of the magazine was dedicated to these topics. This was a bumper edition with a total of 152 pages of content.

Other articles included:

There was a few articles on using Oracle 8i, including how to use Java in the Database, the Internet File System, Intermedia and Data Warehousing. Oracle 8i comes with over 150 new features
There was a couple of articles on the Millennium Bug and how to approach such projects. There was also some advice for organisations who would have to look at how to deal with the introduction of the Euro currency in Europe.
There was a section for articles on new product announcements from Oracle partners, including Quest, Nextek, Maxager, ObjectShare, Constellar (Warehouse Builder), Prism, DataMetrics, IQ Software, Eventus, DataMirror, Precise, Saville, DataShark, J-Database Exchange, Andataco, GeoMedia
Oracle makes available Oracle 8i and the Application Server on a Linux platform for the first time.
With Oracle 8i we have a number of ways of managing our constraints, including:

Deferrable integrity constraints
Non unique indexes for primary key and unique constraints
Immediate constraint enabling

Detecting lock and waiting transactions was always a task that consumed a lot of time for a DBA. A number of scripts was given to help you identify these and to resolve these problems.
For allow of Oracle Certified DBAs out there. There was an article promoting the OCP DBA program and Exam. Some hints and tips about the exam were given, along with some practice questions.
Plus there was 12 pages on adverts at the back of the magazine.

To view the cover page and the table of contents click on the image at the top of this post or click here.

My Oracle Magazine Collection can be found here. You will find links to my blog posts on previous editions and a PDF for the very first Oracle Magazine from June 1987.

This entry was posted in Brendan Tierney, database, Oracle, Oracle Magazine, OTN, oug_ire.

Association Rules in ODM–Part 2

Posted on November 27, 2012

This is a the second part of a four part blog post on building and using Association Rules in Oracle Data Miner. The following outlines the contents of each post in the series on Association Rules

This first part will focus on how to building an Association Rule model
The second post will be on examining the Association Rules produced by ODM – This blog post
The third post will focus on using the Association Rules on your data.
The final post will look at how you can do some of the above steps using the ODM SQL and PL/SQL functions.

In the previous post I looked at the steps needed to setup a data source and to setup the Association Rule node. When everything was setup we ran the workflow.

Step 1 – Viewing the Model

We the workflow has finished running we will have the green tick marks on each node. This is where we left thing at the end of the previous post (Part 1). To view the model details, right click on the Association Role Node and select View Models from the menu.

There are 3 main concepts that are important in relation to Association Rules:

Support: is the proportion of transactions in the data set that contain the item set i.e. the number of times the rule occurs
Confidence: is the proportion of the occurrences of the antecedent that result in the consequent e.g. how many times do we get C when we have A and B {A, B} => C
Lift: indicates the strength of a rule over the random co-occurrence of the antecedent and the consequent

Support and Confidence are the primary measures that are used to access the usefulness of an association rule.

In our example we can see that the the antecedent and the consequent has numbers separated by the word AND. These numbers correspond to the product numbers.

Step 2 – Examining the Model Rules

To read the antecedent and the consequent for the first rule in our example we have:

Antecedent: 137 AND 143 AND 128

Consequent: 144

To read this association rule we would say that if a Customer bought product 137 and product 143 and product 128, then we have a Confidence value of almost 71%. This is a strong association.

We can check the ordering of the rules by changing the Sort By criteria. As Confidence and Support are the main ways to evaluate the rules, we can change the Sort By criteria to be Confidence. Then click on the Query button to refresh the rules section.

Here get a list of the strongest rules listed in descending order.

Below the section of the screen that has the Rules, we have the Rule Details section.

Here we can see that the rule gets formatted into an IF statement. The first rule in the list has a confidence of almost 97%. As it is a simple IF statement it can be easily implemented in our applications.

We want use the information that these rules provides in a number of ways. One such consequence of these rules is that we can look at improving the ordering and distribution of these products to ensure that we have sufficient numbers of each. Another consequence is that we can enhance the front end selling mechanism to make sure that if a customer is buying product 114, 118 and 115 then we can remind the customer of product 119. We can also ensure that all these products are not located beside each other, so that the customer will have to walk past many other products in order to find them. That is why we never see milk and bread beside each other in a grocery store.

Step 3 – Applying Filters to the Model Rules

In the previous step we were able to sort our rules based on some of the measures of our Association Rules and to see how these rules are structured.

Association Rule Analysis can generate many thousands of possible rules for a small data set. In some cases the similar rules can appear and we can have lots of rules that occur so infrequently that they are perhaps meaningless.

ODM provides us with a number of filters that we can apply to the rules that enables use to look for the rules that are of must interest to use. We can access these filters by clicking on the More button, that is located just under the Query button.

We can refine our query on the rules based on the various measures and the number if items in the rule. In addition to this we can also filter based on the values of the items. This is particularly useful if we want to concentrate on specific items (in our example Products). To illustrate this use focus on the rules that involve Product 115. Click on the green + symbol on the right hand side of the window. Select 115 from the list provided. Next we need to decide if we want Product 115 involved in the Antecedent or the Consequent. In our example select the Consequent. This is located to the bottom right of the window. Then click the OK button and then click on the Query button to update the list of rules that correspond with the new filter.

We can see that we only have rules that have Product 115 in the Consequent column.

We can also see that we have 134 rules for this scenarios out of a total of 20,988 (your results might differ slightly to mine and that’s OK. It really depends on what version of the sample data you are using)

Check out the next post in the series (Part 3) where we will look at how you can use the Association Rules produced by ODM.

This entry was posted in Brendan Tierney, data mining, data mining blog, ODM 11g R2, Oracle, Oracle Advanced Analytics, Oracle Analytics Option, Oracle Data Miner, oracle data mining, Oracle Data Mining 11g R2, oraclebigdata, OTN, oug_ire.

Review Oracle Magazine- July/August 1998

Posted on November 19, 2012

The headline articles for the July/August1998 edition of Oracle Magazine were on designing, developing and delivering Data Marts using the Oracle Database and related tools. The main article looks at the different phases of developing a data mart in 90 days.

One of the key messages in these articles is to keep focused on the business problem and that the technology part is very minor in this. This message is still vey key to the analytics and big data world, keep focused on the business problem.

Other articles included:

Oracle ships JDeveloper Suite. It included App Builder for Java, Oracle Application Server 4.0, Oracle 8 Database Server, Symantec’s Visual Page HTML editor, and a one year developer’s membership in the Oracle Technology Network. Yes there used to be a cost to be a member of OTN!!!.
Oracle We Developer Suite wins the PC Magazine Editor Choice award. The suite comes with full development licences for Designer/2000 Release 2.1, including object extensions, Developer/2000 Release 2.1, Oracle App Builder for Java, Oracle Application Server 3.0, Oracle Database Server (releases 7 and 8) and the Oracle Web Development Kit
Oracle Support announce plans to give read only access, via the web, to its Bug database.
There was an advert for TOAD when it was still freeware and provided by ToadSoft.
Security management for Oracle 8, has been increasing in importance over the past few years. For all those people who have some security responsibilities, here are some key elements for database security: System security, Data security, User security, Password management and System auditing. Security is more than just providing a Firewall.
Building Message-based apps with Oracle 8’s Advanced Queuing, involves 5 main steps, including:

Start the server’s AQ background process
Create a database user account to manage queues
Create a user-defined type for application messages
Create a queue table and a corresponding queue of the user defined message type
Build the application to enqueue and dequeue messages of the user defined message type

For the DBAs there was an article on Fast Full Index Scan, how to enable it and gives a number of examples of the hints including the index_fss.

To view the cover page and the table of contents click on the image at the top of this post or click here.

My Oracle Magazine Collection can be found here. You will find links to my blog posts on previous editions and a PDF for the very first Oracle Magazine from June 1987.

This entry was posted in Brendan Tierney, database, Oracle, Oracle Magazine, OTN, oug_ire.

Accepted for BIWA Summit–9th to 10th January

Posted on November 16, 2012

I received an email today to say that I had a presentation accepted for the BIWA Summit. This conference will be in the Sofitel Hotel beside the Oracle HQ in Redwood City.

The title of the presentation is “The Oracle Data Scientist” and the abstract is

Over the past 18 months we have seen a significant increase in the demand for Data Scientists. But how does someone become a data scientist. If we examine the requirements and job descriptions of this role we can see that being able to understand and process data are fundamental skills. So an Oracle developer is ideally suited to being a Data Scientist. The presentation will show how an Oracle developer can evolve into a data scientist through a number of stages, including BI developer, OBIEE developer, statistical analysis, data miner and data scientist. The tasks and tools will be discussed and explored through each of these roles. The second half of the presentation will focus on the data mining functionality available in SQL and PL/SQL. This will consist of a demonstration of an Analytics Development environment and how you can migrate (and use) your models in a Production environment

For some reason Simon Cowell of XFactor fame kept on popping into my head and it now looks like he will be making an appearance in the presentation too. You will have to wait until the conference to find out what Simon Cowell and Being an Oracle Data Scientist have in common.

Check out the BIWA Summit website for more details and to register for the event.

I’ll see you there Smile

This entry was posted in BIWA, Brendan Tierney, data mining, ODM 11g R2, Oracle, Oracle Advanced Analytics, Oracle Analytics Option, oracle big data, Oracle Data Miner, oracle data mining, Oracle Data Mining 11g R2, Oracle R Enterprise, OTN.

Update on : Adding numbers between

Posted on November 9, 2012

Over the past few days I’ve had a number of emails and comments based on my previous post. My previous post was called ‘Adding numbers between two values’. I included some PL/SQL code that can be used to add up the numbers between two values. I mentioned that this was a question that my pre-teen son (a few year pre-teen) had asked me.

There are two main solutions to the same problem. One involves just using a SELECT and the other involves using recursion. I will come back the these alternative solutions below.

But let me start off with a bit more detail and background to why I approached the problem the way that I did. The main reason is that my son is a pre-teen. Over the past couple of years he as expressed an interest in what his daddy does. We even have matching ORACLENERD t-shirts Smile

When I was working through the problem with my son I wanted to show him how to take a problem and by breaking it down into its different parts we can work out an overall solution. We can then take each of these parts and translate them into code. In this case some PL/SQL, yes it is a bit nerdy and we do have the t-shirt. The code that I gave illustrates many different parts of the language and hopefully he will use some of these features as we continue on our learning experience.

It is good sometimes to break a problem down into smaller parts. That way we can understand it better, what works and what does not work, if something does not work then we will know what bit and also leads to easier maintenance. At a later point as you develop an in-depth knowledge of certain features of a language you can then rewrite what you have to be more efficient.

All part of the learning experience.

Ok lets take a look at the other ways to answer this problem. The first approach is to just use a single SELECT statement.

SELECT sum(rownum + &&Start_Number – 1)
FROM dual
CONNECT by level <= &End_Number – &&Start_Number + 1;

An even simpler way is

SELECT sum(level)
FROM dual
CONNECT BY level between &Start_Number and &End_Number;

These queries create a hierarchical query that produce all the numbers between the Start_Number parameter and the End_Number parameter. The SUM is needed to all all the numbers/rows produced. This is nice and simple (but not that easy for by son at this point).

Thank you to everyone who contacted me about this. I really appreciated your feedback and please keep your comments coming for all my posts.

This entry was posted in Brendan Tierney, database, database design, Ireland, Oracle, OTN, oug_ire, PL/SQL, SQL, UKOUG.

OTN

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: