Month: April 2015
There was has been various on going discussions about the every growing “always on” or “always available” aspect of modern life.
Most of us do something like the follow each day:
- check email, twitter, Facebook, etc before getting out of bed
- checking emails and responding to emails and various social media before you before you leave the house
- spending a long day in the office or moving between multiple clients
- spending your lunch time on email and social media
- go home after a long day and open the laptop/tablet etc to answer more emails.
- final check of emails and social media before you go to sleep
Can you remember the last time when you finished work for the day and went home you were actually finish work for the day?
A couple of years ago I did a bit of an experiment that I called ‘Anti Social Wednesdays‘
That experiment was actually very successful but after a few months I did end up going back online. But the amount of work I achieved was huge and during some of that time I started to write my first book.
I’m not going to start out on a new experiment. I’m going to turn off/delete all work related email accounts from my Phone, tablet etc. I’ll leave it on my work laptops (obviously)
What this means is that when I want to look up something on the internet or use an app on my phone, I’m no longer going to be tempted into checking emails and doing one of those quick replies.
So now I won’t be getting any of those “annoying” emails just before you go to sleep, or when you are on a day out with the family, etc.
If something is really really important that you need a response really quickly then you can phone me. Otherwise you are going to have to way until I open my laptop to find your email.
Anyone one else interested in joining in on this little experiment? Let me now.
When you are working with and developing Decision Trees by far the easiest way to visualise these is by using the Oracle Data Miner (ODMr) tool that is part of SQL Developer.
Developing your Decision Tree models using the ODMr allows you to explore the decision tree produced, to drill in on each of the nodes of the tree and to see all the statistics etc that relate to each node and branch of the tree.
But when you are working with the DBMS_DATA_MINING PL/SQL package and with the SQL commands for Oracle Data Mining you don’t have the same luxury of the graphical tool that we have in ODMr. For example here is an image of part of a Decision Tree I have and was developed using ODMr.
What if we are not using the ODMr tool? In that case you will be using SQL and PL/SQL. When using these you do not have luxury of viewing the Decision Tree.
So what can you see of the Decision Tree? Most of the model details can be used by a variety of functions that can apply the model to your data. I’ve covered many of these over the years on this blog.
For most of the data mining algorithms there is a PL/SQL function available in the DBMS_DATA_MINING package that allows you to see inside the models to find out the settings, rules, etc. Most of these packages have a name something like GET_MODEL_DETAILS_XXXX, where XXXX is the name of the algorithm. For example GET_MODEL_DETAILS_NB will get the details of a Naive Bayes model. But when you look through the list there doesn’t seem to be one for Decision Trees.
Actually there is and it is called GET_MODEL_DETAILS_XML. This function takes one parameter, the name of the Decision Tree model and produces an XML formatted output that contains the attributes used by the model, the overall model settings, then for each node and branch the attributes and the values used and the other statistical measures required for each node/branch.
The following SQL uses this PL/SQL function to get the Decision Tree details for model called CLAS_DT_1_59.
If you are using SQL Developer you will need to double click on the output column and click on the pencil icon to view the full listing.
Nothing too fancy like what we get in ODMr, but it is something that we can work with.
If you examine the XML output you will see references to PMML. This refers to the Predictive Model Markup Language (PMML) and this is defined by the Data Mining Group (www.dmg.org). I will discuss the PMML in another blog post and how you can use it with Oracle Data Mining.
In my previous blog post I showed you how you can have a look at the transformations that the Automatic Data Preparation (ADP) feature of Oracle Data Mining produces. I also gave some example of the different types of ADF that are performed for different algorithms.
One of the features of the transformations produced is that it will generate a REVERSE_EXPRESSION. This will take the scored results and apply the inverse of the transformation that was performed when the data was being prepared for input to the algorithm.
Somethings you may want to have the scored data returned in a slightly different ways or labeled in a slightly different way.
In this blog post I will show you how to define an alternative REVERSE_EXPRESSION for an attribute.
The function we need to use for this is the ALTER_REVERSE_EXPRESSION procedure that is part of the DBMS_DATA_MINING package.
When we score data for a typical classification problem we typically use 0 (zero) and 1 to be the target variable values. But what if we wanted the output from our classification model to label the scored data slighted differently.
In this case we can use the ALTER_REVERSE_EXPRESSION procedure to define the new values. What if we wanted the zero to be labeled as NO and the 1 as YES. In this case we can use the following.
model_name => ‘CLAS_NB_1_59’,
expression => ‘decode(affinity_card, ”1”, ”YES”, ”NO”)’,
attribute_name => ‘AFFINITY_CARD’);
When we view the transformations for our data mining model we can now see the transformation.
Now when we score our data the predicted target variable will now have our newly defined values.
PREDICTION(CLAS_NB_1_59 USING *) PRED
FETHC FIRST 5 ROWS ONLY;
You can see that this is a very powerful feature and allows use to turn the scored data values is a different way to make them more useful. This is particularly the case as we work towards a more Automatic type of Predictive Analytics.
A very powerful feature of Oracle Data Mining and one that I think does not get enough notice is called Automatic Data Preparation.
Data Preparation is one of the most time consuming, repetitive and boring parts of the work that a Data Miner or Data Scientist performs as part of their daily tasks. Apart from gathering the data, integrating the data, getting the data into the required formation the most interesting part of the work is with feature engineering.
Then you have all the other boring data preparation tasks of how to handle missing data, type conversion, binning, normalization, outlier treatment etc.
With Automatic Data Preparation (ADP) in Oracle Data Mining you can let Oracle work all of these things out for you and to perform all the necessary coding and to store all of this coding as part of the in-database data mining model.
This is Fantastic. This ADP feature can same you hours and in some cases days of effort.
But (there is always a but 🙂 ) what if you are a bit unsure if the transformations that are being performed are exactly what you would wanted. Maybe you would like to see what Oracle is doing and depending on this you can do it a different way.
The first step is to examine the transformations that are generated by stored as part of the in-database data mining model. The DBMS_DATA_MINING package has a function called GET_MODEL_TRANSFORMATIONS. When you query this function, passing in the name of the data mining model, you will get returned the list of transformations that have been applied to each model.
In the following example a GLM model was created using the Oracle Data Miner tool (that is part of SQL Developer). When you use Oracle Data Miner, ADP is automatically turned on.
The following query calls the GET_MODEL_TRANSFORMATIONS function with the data mining model called CLAS_GLM_1_59/.
SELECT * FROM TABLE(DBMS_DATA_MINING.GET_MODEL_TRANSFORMATIONS(‘CLAS_GLM_1_59’));
The following image contains the output generated by this query.
When you look at the data under the EXPRESSION column we get to see what the ADP did to the data. In most of the cases there are just some simple data clean-up being performed and formatting for getting the data ready for input into the algorithm.
If we now look at the Naive Bayes model for the same data set we get a very different sent of transformations being listed under the EXPRESSION column.
SELECT * FROM TABLE(DBMS_DATA_MINING.GET_MODEL_TRANSFORMATIONS(‘CLAS_NB_1_59’));
Now we get to see some of the data binning that ADP performs and is required for input to the Naive Bayes algorithm. You will also notices that we also have some transformations in the REVERSE_EXPRESSION column. These are the inverse or reverse of the transformation that was generated in the EXPRESSION column.
I will let you explore the data transformations that are produced by ADP for the SVM and Decision Tree algorithms.
I will show you how you change the reverse expression in my next blog post, as there are times when you might want the data to be presented slightly differently after the model has been run to score your data.
To get more details of what Automatic Data Preparation is performed for each data mining algorithm you can check out this link in the 11g documentaion. This section seems to be missing from the online 12c documentation.
The headline articles of Oracle Magazine for Fall 1987 (the Second issue) were on Enhancing cellular communications for Canadian Cellular, using Oracle to combat predatory starfish on Australia’s Great Barrier Reef and breaking the 640K barrier using powerful microcomputers.
This was a bumper issue in comparison to the very first edition.
- Oracle (International) User Week was held during week 27th September. This coincided with Oracle’s 10th anniversary, and have over 1000 attendees. That is a bit of difference in numbers that Oracle Open World now gets!
- Oracle post record Revenues and Earnings. Fiscal year 1996 had revenue of $55.4 million and fiscal year 1997 had revenue of $131.3 million, with fourth quarter revenue of $50 million.
- AHOLD, a leading Dutch supermarket chain is using the Oracle database to automate 500 of its Albert Heijn (NL) supermarkets.
- Finnish Defense Ministry deploys the Oracle database system.
- SQL*Menu is released. Its main was to all application designers can unify applications build with SQL*Forms, SQL*Report, SQL*Plus and other applications.
- Loews Anatole Hotel in Dallas had recently checked in over 3,100 guests and checked out 2,900 guest in one day. All of this processing what done using an Oracle Database .
- Over 120 competitions in 16 days. More than 2,600 athletes from 51 countries. An estimated 1.6 million spectators. Volunteers, scorers and worldwide media Organising and managing the 1998 Winter Olympics in Calgary was all done using an Oracle database.
- There was an article on writing SQL by Richard Finkelstein, using some of the set operators and how to update data using a nested query.
To view the cover page and the table of contents click on the image at the top of this post or click here.</
My Oracle Magazine Collection can be found here. You will find links to my blog posts on previous editions and a PDF for the very first Oracle Magazine from June 1987.