Uncategorized « Ora-lytics

OUG Ireland 2014 Call for Presentations is now open

Posted on October 30, 2013

The call for presentations at the OUG Ireland 2014 conference (or special event) is now open for submissions. Deadline is Sunday 15th December, 2013.

We hope to build on the previous successful events over the past few years, where we had seen the number of attendees double in two year.

The annual conference (or special event) will be back in the Conference Centre in Dublin, which is just a few minutes walk from the centre of Dublin.

There will be a number of streams and these will include:

Technology / DBA
Business Intelligence
Development
Applications & E-Business

So there will be something for everyone and I’m sure there will be some sessions on 12c.
We are hoping to get some very well known names in the Oracle World to present at the conference and these people along should attach a large number of attendees Smile

If you are interested in Submitting a presentations then click on the image above or below and you will be taken the submissions webpage.

I’ve already submitted some presentations and although I’m on the conference committee, there is no guarantee I’ll have one selected Sad smile
I hope to see you there.

This entry was posted in Uncategorized.

Oracle Scene 12c Supplement

Posted on October 19, 2013

The autumn edition of Oracle Scene is now available (I’m the deputy editor) and if you are due to get a printed copy then you be receiving it really soon.

In addition to the main Oracle Scene magazine we had lots of extra 12c articles, so we decided to create a separate 12c Supplement. This is only available online.

There was article from Jonathan Lewis, Bryn, Llewellyn, Melanie Caffrey, Pete Finnigan, Bob Mycroft, Alex Nuijten and myself.

Check out this online edition by clicking on the image below

If you are using Oracle 12c or any of the related produce over the next past few months or next few month, why not write a short article telling us about your experiences. The next submission deadline is early January for the Spring edition.

This entry was posted in Uncategorized.

Review of OOW13–Part 2

Posted on October 3, 2013

This is part 2 of my review of OOW13 and it looks at some of the announcements that were made and were of interest to me. There were many-many more announcements. Here is the link to my OOW13-Part1 post.

In-Memory Database, combining the benefits of row and columnar storage, both will run at the same time in the database, and all of this will be in-memory. When a query is issued the database will decide which version of the data it will access to retrieve. The typical row level storage will be used for OLTP type queries and the columnar storage for DW type queries. All of the processing, updates and maintenance of this will be done seamlessly in the background, including the syncing of data between the row and columnar versions. No new coding or recoding of your applications is needed. All that is involved to enable this feature is a setting in the database. Oracle claim you will see 1000x performance with queries and 2x for transaction. This will be available in 12.1.2 and will be an extra licence cost.

Database-as-a-Service will be a full Database-as-a-Service with your own instance. A range of support/offerings will be available ranging from basic infrastructure and DB, to managed and managed for maximum availability.

Everything-as-a-Service, or it seems that way. In addition to the full database as a service, it seemed that they announced every piece of software that they have is available as a service. Basically anything you want is available with different levels of support from self managed up right up to a full managed service and everything in-between.

Oracle Backup, Logging and Recovery appliance was announced. This appliance is designed and built to receive, capture and store backups (including incrementals) for your database server, whether it is located in a private or public cloud. The appliance will backup your database server in real-time and not just from the last backup.

Microsoft and Oracle. Yes Microsoft presented as part of the keynote presentation on Wednesday. Microsoft presented about their Oracle offerings on the Azure Cloud platform. They are offering the Oracle Database on Microsoft servers and also on Linux servers.

This entry was posted in Uncategorized.

The 2013 Gartner Hype Cycle

Posted on August 22, 2013

The 2013 Gartner Hype Cycle is out and it can be interesting to compare the new graph with the ones from previous years. Particularly for my interests in Data Science, Big Data, Data Mining, Predictive Analytics and of course the Oracle Database.

Gartner Hype Cycles 2013

from 2012

from 2011

from 2010

from 2009

This entry was posted in Uncategorized.

DBMS_PREDICTIVE_ANALYTICS & Profile

Posted on July 12, 2013

In this blog post I will look at the PROFILE procedure that is part of the DBMS_PREDICTIVE_ANALYTICS package. The PROFILE procedure generates rules that identify the records that have the same target value.

Like the EXPLAIN procedure, the PROFILE procedure only works with classification type of problems. What the PROFILE procedure does is it works out some rules that determine a particular target value. For example, what rules determine if a customer will take up an affinity card and the rules for those who do not take up an affinity card. So you will need a pre-labelled data set with the value of the target attribute already determined.

Oracle does not tell us what algorithm that they use to calculate these rules, but they are similar to the rules that are produced by some of the classification algorithms that are in the database (and can be used by ODM).

The syntax of the PROFILE procedure is

DBMS_PREDICTIVE_ANALYTICS.PROFILE (
     data_table_name           IN VARCHAR2,
     target_column_name        IN VARCHAR2,
     result_table_name         IN VARCHAR2,
     data_schema_name          IN VARCHAR2 DEFAULT NULL);

Where

Parameter Name	Description
data_table_name	Name of the table that contains the data that you want to analyze.
target_column_name	The name of the target attribute.
result_table_name	The name of the table that will contain the results. This table should not exist in your schema, otherwise an error will occur
data_schema_name	The name of the schema where the table containing the input data is located. This is probably in your current schema, so you can leave this parameter NULL.

The PROFILE procedure will produce an output table called ‘result_table_name) in your schema and this table will contain 3 attributes.

PROFILE_ID

This is the PK/unique identifier for the profile/rule

RECORD_COUNT

This is the number of records that are described by the profile/rule

DESCRIPTION

This is the profile rule and it is in XML format and has the following XSD

Using the examples I have used in my previous blog posts, the following illustrates how to use the PROFILE procedure.

BEGIN
   DBMS_PREDICTIVE_ANALYTICS.PROFILE(
      DATA_TABLE_NAME    => ‘mining_data_build_V’,
      TARGET_COLUMN_NAME => ‘affinity_card’,
      RESULT_TABLE_NAME => ‘PA_PROFILE’);
END;

NOTE: For the above examples I used and 11.2.0.3 database.

This entry was posted in Uncategorized.

Oracle Magazine-Jan/Feb 2000

Posted on June 27, 2013

The headline articles of Oracle Magazine for January/February 2000 were focused on looking forward to what is to come, now that the year 2000 bomb. These articles include large scale, 24×7 data warehouses and marts, more development using Java, more and better B2B with XML.

This issue of Oracle Magazine introduced a new layout and design.

Oracle Magazine-Nov/Dec 1999

Posted on June 17, 2013

The headline articles of Oracle Magazine for November/December 1999 were E-Business and how you can use the Oracle product set to put your business online. These articles included features on companies such as AMR, Fogdog, Cognitiative, Drug Emporium, Click-fil-A, Living, CD Now, Trilux and Lycos Networks.

Other articles included:

Oracle Developer and Developer Server 6i was released. These new tools also form the underlying technology for the Oracle Applications release 11i.
We also have the launch of Oracle Designer 6i with new features including: Repository based configuration management, support for files and folders, detailed dependency analysis, enhanced support for Oracle 8i server generation, enhanced generation of Oracle Developer Forms and visual repository extensibility.
Oracle releases the Oracle Discoverer Y2K Assistance. This was workbook identifies possible Y2K errors in the end user layer
Oracle Express 6.3 is released.
Oracle 8i is released for the Apple Macintosh
Oracle JDeveloper Modeling Tools are scheduled for release in early 2000 and will provide an single integrated toolset that will include: the Unified Modelling Language (UML) support, Java editing, compiling and debugging, Java runtime component framework for persistence and transactions, Multi-user repository for managing models as well as files, and Deployment to choice of servers in an n-tier environment.
The Business and Accounting Software Developers Association and the German Association for Technical Inspection have certified Oracle Financials Release 11 for Euro (€) compliance.
Donald Burleson has an article on Tuning Disk I/O in Oracle 8. Be sure to tune your SQL before you start to reorganise your disks.The article looks at how you can investigate if a disk becomes stalled while handling simultaneous I/O request and proposes a couple of ways you can address these issues.
Joe Johnson’s article on Using Oracle Database Auditing to Tune Performance looks at how you can tune the components of the SGA, in particular the shared pool and the database buffer cache.

As this was the Oracle Open World edition you can imagine that there was a large number of advertisements in the magazine.
To view the cover page and the table of contents click on the image at the top of this post or click here.
My Oracle Magazine Collection can be found here. You will find links to my blog posts on previous editions and a PDF for the very first Oracle Magazine from June 1987.

This entry was posted in Uncategorized.

Introducing Java EE7–Live Webcast 12th June, 2013

Posted on May 31, 2013

There will be live Wecast on June 12 (2013) on Introducing Java EE7. There will be some keynotes, some break out sessions that you can attend and you will have the opportunity to chat with some Java experts. The highlights of this event include:

Business Keynote (Hasan Rizvi and Cameron Purdy)
Technical Keynote (Linda DeMichiel)
Breakout Sessions on different JSRs by specification leads
Live Chat
Lots of Demos
Community, Partner, and Customer video testimonials

This is a free event, so sign up now to book your place.

I joined a conference call on Thursday that was organised for members of the Oracle ACE program. This a full 1 hour conference call, presented by Arun Gupta. He spent the one hour call going through some of the new features coming in Java EE7

This entry was posted in Uncategorized.

Recent Big Data and Analytics related articles

Posted on May 13, 2013

Over the past couple of weeks I’ve come across the following articles, blog posts and discussions about Big Data and Analytics. There seems to be an underlying theme of ‘let’s get back to the core of the problem’ and big data is not that useful and only in certain cases.

As the Analytics 3.0 article indicates we should be concentrating on how we can use analytics to achieve a real goal for the organisation.

Analytics 3.0

Most data isn’t “big,” and businesses are wasting money pretending it is

– There is a LinkedIn discussion about this article

7 Myths about Big Data

Most data sets are 40-60GB range – I can do that on my laptop, so that cannot be Big Data

Big Data Hype (and Reality)

It is also interesting to note that most of the people who have been working in the area for years (10+) are not believes in Big Data or they don’t even consider calling themselves Data Scientists.

The 10 Most Influential People in Data Analytics, Data Mining, Predictive Analytics

The purpose of this post to to record these links in one place and to share with everyone else who might be interested.

This entry was posted in Uncategorized.

New website for my blog

Posted on May 8, 2013

A few days ago I moved my blog to a new domain name

www.oralytics.com

Check it out. Wait you already are if you are reading this Smile

The domain name is a merger of Oracle and Analytics, and has a familiar ring to it for those of you who know Oracle.

The old web link still works (for now)

brendantierneydatamining.blogspot.com

I’ll be look to update the look and feel over the coming months.

This entry was posted in Uncategorized.

Clustering in Oracle Data Miner-Part 5

Posted on March 14, 2013

This is a the fifth and final blog post on building and using Clustering in Oracle Data Miner. The following outlines the contents of each post in this series on Clustering.

The first part we looked at what clustering features exist in ODM and how to setup the data that we will be using in the examples
The second part will focus on how to building Clusters in ODM .
The third post will focus on examining the clusters produced by ODM and how to use the Clusters to apply to new data using ODM.
The fourth post will look at how you can build and evaluate a Clustering model using the ODM SQL and PL/SQL functions.
The fifth and final post will look at how you can apply your Clustering model to new data using the ODM SQL and PL/SQL functions.

Step 1 – What Clustering models do we have

In my previous post I gave the query to retrieve the clustering models that we have in our schema. Here it is again.

column model_name format a20
column mining_function format a20
column algorithm format a20
SELECT model_name,
       mining_function,
       algorithm,
       build_duration,
       model_size
FROM ALL_MINING_MODELS
WHERE mining_function = ‘CLUSTERING’;

This time we see that we have 3 cluster models. Our new model is called CLUSTER_KMEANS_MODEL.

column child format a40
column cluster_id format a25

select cluster_id,
       record_count,
       parent,
       tree_level,
       child
from table(dbms_data_mining.get_model_details_km(‘CLUS_KM_1_25’))

The following image shows all the clusters produced and we can see that we have the renamed cluster labels we set when we used the ODM tool.

Step 2 – Setting up the new data

There are some simple rules to consider when preparing the data for the cluster model. These really apply to all of the data mining algorithms.

– You will need to have the data prepared and in the same format as you used for building the model

– This will include the same table structure. Generally this should not be a problem. If you need to merge a number of tables to form a table with the correct format, the simplest method is to create a view.

– All the data processing for the records and each attribute needs to be completed before you run the apply function.

– Depending on the complexity of this you can either build this into the view (mentioned above), run some pl/sql procedures and create a new table with the output, etc. I would strongly suggest that the minimum pre-processing you have to do on the data the simpler the overall process and implementation will be.

– The table or view must have one attribute for the CASE_ID. The CASE_ID is an attribute that is unique for each record. If the primary key of the table is just one attribute you can use this. If not then you will need to generate a new attribute that is unique. One way to do this is to concatenate each of the attributes that form the primary key.

Step 3 – Applying the Cluster model to new data – In Batch mode

There are two ways of using an Oracle Data Mining model in the database. In this section we will look at how you can run the cluster model to score data in a batch mode. What I mean by batch mode is that you have a table of data available and you want to score the data with what the model thinks their cluster will be.

To do this we need to run the APPLY function that is part of the DBMS_DATA_MINING package.

For clustering we do not have CASE_ID, so we can leave this parameter NULL.

One of the parameters is called RESULT_TABLE_NAME. Using the DBMS_DATA_MINING.APPLY package and function, it looks to create a new table that will contain the outputs of the cluster scoring. This table (for the KMeans and O-Cluster algorithms) will contain three attributes.

CASE_ID       VARCHAR2/NUMBER
CLUSTER_ID    NUMBER
PROBABILITY   NUMBER

The table will have the CASE_ID. This is the effectively the primary key of the table.

If we take our INSURANCE_CUST_LTV table as the table containing the new data we want to score (Yes this is the same table we used to build the cluster model) and the CLUSTER_KMEANS_MODEL as the cluster model we want to use. The following codes show the APPLY function necessary to score the data.

BEGIN

DBMS_DATA_MINING.APPLY(
     model_name          => ‘CLUSTER_KMEANS_MODEL’,
     data_table_name     => ‘INSURANCE_CUST_LTV’,
     case_id_column_name => ‘CUSTOMER_ID’,
     result_table_name   => ‘CLUSTER_APPLY_RESULT’);
END;

On my laptop this took 3 second to complete. This involved scoring 15,342 records, creating the table CLUSTER_APPLY_RESULT and inserting 153,420 scored records into the table CLUSTER_APPLY_RESULT.

Why did we get 10 times more records in our results table than we did in our source table ?

Using the batch mode i.e. using the DBMS_DATA_MINING.APPLY function it will create a record for each of the possible clusters that the record will belong too along with the probability of it belonging to that cluster. In our case we have built our clustering models based on 10 clusters.

In the following diagram we have a listing for two of the customers in our dataset, the clusters that have been assigned to them and the probability of that record/customer belonging to that cluster. We can then use this information to make various segmentation decisions based on the probabilities that each has for the various clusters.

Step 4 – Applying the Cluster model to new data – In Real-time mode

When we looked at applying a classification algorithm to new data we were able to use the PREDICTION SQL function. As clustering is an unsupervised data mining technique we will not be able to use the PREDICTION function.

Instead we have the functions CLUSTER_ID and CLUSTER_PROBABILITY.

CLUSTER_ID will tell us what cluster the record is most likely to belong too i.e. the cluster with the highest probability.

This is different to the bulk processing approach as we will only get one record/result being returned.

In the following example we are asking what cluster do these two customers most likely belong too.

SELECT customer_id,
cluster_id(cluster_kmeans_model USING *) as Cluster_Num
FROM insurance_cust_ltv
WHERE customer_id in (‘CU3141’, ‘CU3142’);

Is we look back to Step 3 above we will see that the clusters listed correspond to what we have discovered.

The next function is CLUSTER_PROBABILTY. With this function we can see what the probability of customer belonging to a particular cluster. Using the results for customer CU3141 we can see what the probability is for this cluster, along with a few other clusters.

SELECT customer_id,
       cluster_probability(cluster_kmeans_model, ‘3’ USING *) as Cluster_3_Prob,
       cluster_probability(cluster_kmeans_model, ‘4’ USING *) as Cluster_4_Prob,
       cluster_probability(cluster_kmeans_model, ‘7’ USING *) as Cluster_7_Prob,
       cluster_probability(cluster_kmeans_model, ‘9’ USING *) as Cluster_9_Prob
FROM   insurance_cust_ltv
WHERE customer_id = ‘CU3141’;

We can also combine the CLUSTER_ID and CLUSER_PROBABILITY functions in one SELECT statement.

In the following query we want to know what the most likely cluster is for two customers and the cluster probability.

SELECT customer_id,
       cluster_id(cluster_kmeans_model USING *) as Cluster_Num,
        cluster_probability(cluster_kmeans_model, cluster_id(cluster_kmeans_model USING *) USING *) as Cluster_Prob
FROM   insurance_cust_ltv
WHERE customer_id in (‘CU3141’, ‘CU3142’);

Check back soon for my more blog posts on performing data mining in Oracle, using the Oracle Data Miner tool (part of SQL Developer) and the in-database SQL and PL/SQL code.

I hope you have enjoyed blog posts on Oracle Data Miner and you have found them useful. Let me know if there are specific topics you would like me to cover.

Thanks

Brendan Tierney

This entry was posted in Uncategorized.

Anti-Social Wednesdays

Posted on February 26, 2013

It is about time that I got round to starting my New Years Resolution. What I’m calling it is Anti-Social Wednesdays. What does this mean? Each Wednesday, I’m going to cut myself off from the “always on” culture of the IT World. This means I will not be turning on my email, twitter, facebook, LinkedIn, etc. In addition to these I will also be turning off my Email, unplugging my Desk phone and turning off my mobile. These will only get turned back on or plugged back in on a Thursday morning.

Yes this will mean that anything “urgent” that comes up on a Wednesday will have to wait until I get to it on the Thursday (or Friday).

What am I going to do on my Anti-Social Wednesday? I will be concentrating on getting some work done (without all the interruptions), doing some DBA work, working of various projects, writing some blog posts or presentations, trying new products, testing new releases of software, etc. Oracle 12c Database is coming out in a couple of weeks. Also a new release of SQL Developer 4 and a new release of Oracle Data Miner. So will be concentrating on these during March, April and May.

I will also be trying to do all of this work in new/different locations. So you will not find me at my desk (hopefully).

Now you might see some blog posts or tweets appearing from time to time on a Wednesday. All this means is that I had these scheduled to go on that day.

If you really, really, really, really need to contact me then you can work out how to do it or you will know how to do it!

This entry was posted in Uncategorized.

Uncategorized

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: