Oracle Data Miner
Earlier today (4th May) SQL Developer 4.1 was released 🙂
For those of you who use the Oracle Data Miner tool (that is part of SQL Developer) you will need to upgrade your repository. The following steps will walk you through the process.
1. Download SQL Developer (you do need to have Java 8 installed) This download does not come with the JRE built into it. This usually comes a few days after the release.
2. Unzip the downloaded file and copy the extracted directory to where you like to keep your applications etc.
3. Start up SQL Developer by running the sqldeveloper.exe file. This will located in the extracted folder \sqldeveloper-126.96.36.199.07-no-jre\sqldeveloper
4. If you have been a previous install of SQL Developer you will be asked if you want to migrate your current settings. Click on the Yes button and all your connections and settings will be migrated.
5. To upgrade your Oracle Data Miner (ODMr) repository, you will need to open one of your ODMr connections. When you do this ODMr will check to see if the repository in your database needs to be updated. If it does you will get the following window.
6. Enter the password for SYS
6. When you get the following window you can click on the Start button to begin the Oracle Data Miner repository upgrade.
7. After a couple of minutes (and depending on the number of ODM Workflows and ODM schemas to have) you will get the following window.
Congratulations. You have now upgraded your Oracle Data Miner repository.
If you do encounter any errors during the upgrade of the repository then you should get onto the OTN Forum for Oracle Data Miner and report the errors. The Oracle Data Miner team monitor this forum and will get back to you quickly with a response.
In this blog post I want to show you how you can go about evaluating your classification models that you develop using Oracle Data Miner (part of SQL Developer).
What I’m not going to show you here is how to develop classification models using Oracle Data Mining 😦 I’ve had several blog posts over the years on this topics. So you can go and search of those posts or alternately this topic is cover in a lot more detail in my Oracle Data Miner book 🙂
After you have developed your ODM models in Oracle Data Miner you have 2 levels of details available to you. The first of these is the Compare Test Results. You can find this by right clicking on the Classification node of your ODM Workflow, as showing below.
When you select the Compare Test Results a new (worksheet) tab will open. This will display summary statistics and graphics for the summary statistics for each Oracle Data Ming model created. In the following image an ODM model was created for each In-Database Classification algorithm in the Oracle Database.
Here we get to see 2 of the statistical measures that I talked about in my previous blog post, the (average) Accuracy and the Overall Accuracy. We can look at and examine this in a bit more detail in a minute. A new measure that I haven’t mentioned before is the Predictive Confidence.
The Predictive Confidence measure provides an estimate of the overall goodness of the model. Predictive Confidence is a number between 0 and 1. Data Miner displays Predictive Confidence as a percent.
- If Predictive Confidence=0, then it indicates that the predictions of the model are no better than the predictions made by using the naive model.
- If Predictive Confidence=1, then it indicates that the predictions are perfect.
- If Predictive Confidence=0.5, then it indicates that the model has cut the error of a naive model by 50%./li>
So the higher the value for Predictive Confidence the better the model. Particularly when it is higher than 50%.
After evaluation these summary statistical measures you will want to drill down on these to see the lower level statistical measures, for example you will want to see the confusion matrix and the corresponding statistical measures. To view the confusion matrix all you need to do is to click on the Performance Matrix tab. Before you can really start evaluating the models you will need to click on the Display drop down and select ‘Show Detail’ from the drop down list. Another thing you will need to do is to click/check the ‘Show totals and codes’ check box on the lower part of the screen. This will give you some of the statistical measures that I outlined in my previous blog post.
When you examine the statistical measures displayed on the screen you will notice that some of the statistical measures I outlined in my previous blog post are missing. Some of these missing measures are ones that you will want to consider and use as part of your evaluation of you ODM models.
So what how do you find out what these missing statistical measures are? Well ODM does not display these so the only real option open to you is to go and calculate them yourself 😦 This is not ideal but these are relatively easy to calculate and you can do this on a piece of paper or you can open your spreadsheet software and let it calculate them for you (once you have defined to formula for each). Here is an example of the completed/extended confusion matrix based on the results from the CLAS_SVM_1_59 model shown in the above image.
In my next blog post I will look at how you can evaluate a classification model that was developed using the in-database Oracle Data Mining algorithms (Oracle Data Miner GUI was not used). The evaluation criteria that I will show will be based on the statistical methods that I highlighted in my first blog post on this topic.
If you are downloading the EA1 of SQL Developer that includes Oracle Data Miner (ODMr), and you intend to use Oracle Data Miner then you will need to update the ODMr Repository.
You could do it the hard way and run the upgrade repository sql scripts that are located in the …\sqldeveloper-188.8.131.52.29-no-jre\sqldeveloper\dataminer\scripts directory.
Or you could do it the easy way and let the inbuilt functionality in Oracle Data Miner do it for you.
To do it the easy way all you need to do is to open the ODMr Connections window and the double click on one of your ODM connections.
ODMr will check the version of the repository you have installed and if needed it will prompt you about upgrading the repository. Select Yes and you will be prompted to enter the SYS password. So talk kindly with your DBA for them to enter the password for you. Then click on the Start button. They will lick off the OMDr Repository Upgrade scripts.
NB: Make sure you have a backup of your workflows before you do this. A little think happened to me during the SQL Dev / ODMr 4.0 upgrade back in September 2013 where all my workflows disappeared. You can imagine how happy I was about that. Since then the ODMr team have added some functionality to ensure something like this doesn’t happen again. But you never know.
To backup your ODMr workflows use the Export Workflow option.
When the repository upgrade has finished you will get a ‘Task Complete Successfully’ message in the upgrade window. Click on the close button and away you go with this updated version.
A few days ago the first Early Adaptor release of SQL Developer 4.1 (EA1) was made available. You can go ahead and download it from here and make sure to check out the blog post by Jeff Smith on some install and setup that is required around the latest version of Java.
I’ve been using SQL Developer since its very first release, so getting my hands on a new release is very exciting. There are lots and lots of new features in the tool. Again check out the blog posts by Jeff Smith and Kris Rice on some of these new features. I really like the new DBA screens 🙂 But this screen really needs some scroll bars and not everything fits on my screen. So Jeff and Kris if you are reading this, can you add some scroll bars.
In addition they have been working on “new” SQL*Plus that is called SDSQL. This is a new command line tool that is supposed to be bigger and better than SQL*Plus but still gives us a command line tool to run our scripts and demos. To download and install the tool go to here.
As you know I’m a bit of an Oracle Data Miner/Mining fan. There are now new in-database features, but there are a lot of new features in the GUI tool (aka ODMr) along with some improvements and bug fixes. Here is a list of the ODMr 4.1 EA1 new and updated features (taken from the ODMr Help in SQL Dev)
JSON Data Support for Oracle Database 184.108.40.206 and above
In response to the growing popularity of JSON data and its use in Big Data configurations, Data Miner now provides an easy to use JSON Query node. The JSON Query node allows you to select and aggregate JSON data without entering any SQL commands. The JSON Query node opens up using all of the existing Data Miner features with JSON data. The enhancements include:
Data Source Node
Automatically identifies columns containing JSON data by identifying those with the IS_JSON constraint.
Generates JSON schema for any selected column that contain JSON data.
Imports a JSON schema for a given column.
JSON schema viewer.
Create Table Node
Ability to select a column to be typed as JSON.
Generates JSON schema in the same manner as the Data Source node.
JSON Data Type
Columns can be specifically typed as JSON data.
JSON Query Node
Ability to utilize any of the selection and aggregation features without having to enter SQL commands.
Ability to select data from a graphical layout of the JSON schema, making data selection as easy as it is with scalar relational data columns.
Ability to partially select JSON data as standard relational scalar data while leaving other parts of the same JSON document as JSON data.
Ability to aggregate JSON data in combination with relational data. Includes the Sub-Group By option, used to generate nested data that can be passed into mining model build nodes.
Improved database session management resulting in less database sessions being generated and a more responsive user interface.
Filter Columns Node
Combined primary Editor and associated advanced panel to improve usability.
Explore Data Node
Allows multiple row selection to provide group chart display.
Classification Build Node
Automatically filters out rows where the Target column contains NULLs or all Spaces. Also, issues a warning to user but continues with Model build.
Enhanced workflows to ensure that Loading, Reloading, Stopping, Saving operations no longer block the UI.
Revised the Online Help to adhere to topic-based framework.
Selected Bug Fixes (does not include 4.0 patch release fixes)
GLM Model Algorithm Settings: Added GLM feature identification sampling option (Oracle Database 12.1 and above).
Filter Rows Node: Custom Expression Editor not showing all possible available columns.
WebEx Display Issues: Fixed problems affecting the display of the Data Miner UI through WebEx conferencing.
Denny Wong of the ODM team in Oracle has made available a tutorial on importing JSON data for use with ODMr. Check it out here.
I’ve been told there will be a couple of tutorials on the new features coming out (from the ODMr team) over the next few weeks. So keep an eye out of these.
Check out my blog post on what you need to do to get started/using ODMr 4.1 EA1.
When Oracle Data Miner (ODMr) 4.0 (which is part of SQL Developer) came out back in late 2013 there was a number of new features added to the tool. One of these was a Graph node that allows us to create various graphs and charts that include Line, Scatter, Bar, Histogram and Box plot.
I’ve been using this node recently to produce graphs and particularly scatter plots. I’ve been using the scatter plots to graph the Actual values in a data set against the Predicted values that was generated by ODMr. In this scenario I had a separate data set for training my ODM data mining models and another testing data set for, well testing how well the model performed against an unseen data set.
In general the graphs produced by the Graph node look good and gives you the information that you need. But what I found was that as you increased the size of the data set, the scatter plot can look a messy. This was in part due to the size of the square used to represent a data point. As the volume of data increased then your scatter plot could just look like a coloured in area of blue squares. This is illustrated in the following image.
What I discovered today is that you can zoom in on this graph to explore different regions and data point on it. This do this you need to select an data that is within the x-axis and y-axis area. When you do this you will see a box form on your graph that selects the area that you indicate by moving your mouse. After you have finished selecting the area, the Graph Node will zooms into this part of the graph and shows the data points. For example if I select the area from about 1000 on the x-axis and 1000 on the y-axis, I will get the following.
Again if I select a similar are area of 350 on the x-axis and 400 on the y-axis I get the following zoomed area.
You can keep zooming in on various areas.
At some point you will have finished zooming in and you will want to return to the original graph. To zoom back outward all you need to do in the graph is to click on it. When you do this you will go back to the previous step or image of the graph. You can keep doing this until you get back to the original graph. Alternatively you can zoom in and out on various parts of the graph.
Hopefully you will find this feature useful.
Some of you may be aware that I have been writing a on Oracle Data Miner. Actually the book covers the Oracle Data Miner GUI that is part of SQL Developer, the SQL and PL/SQL functions, procedures and packages that form the Oracle Data Mining option in the database and lots of other topics for the DBA, Developer and BI/DW people.
Today is a bit day for this book as it is officially released and available for purchase. See below for some links to where you can but the book in print and e-book formats. It has been published by McGraw-Hill/Oracle Press.
The book is aimed at a variety of people and the aim of the book is to introduce them to using the Oracle Data Miner tool and how to perform various data mining and predictive analytics tasks using SQL and PL/SQL.
The book will not teach you about how each of the data mining algorithms works. There is a bit of an assumption that you know a bit about these already. There are lots of books and resources about that cover that material. You can look on my book as an getting start / how to use type of book.
Below are are the images of the front cover and the back cover.
For more details of the book and for some updates keep an eye on my ODM Book page. On this page I’m adding a FAQ secion. This will be based on questions that I receive about the book.
If you buy the book then I hope you will find it helpful. If you are going to attend one of my presentations at an Oracle User Group meeting then bring the book along and I can sign it for you. Alternatively if you are at Oracle Open World 2014, come along to the Oracle Press Book Store, as I will be there to sign books on Wednesdays 1st October between 13:00 and 13:30.
Where can you Buy my Oracle Data Miner book (print and e-book).
You can buy the book from the McGraw-Hill/Oracle Press website and from Amazon. Each site will offer discounts so check out which one is the best for you.
For USA locations (enter promo code Tierney to save 20% and free delivery) www.mhprofessional.com
For UK & Ireland locations (enter promo code Tierney to save 20% and free delivery) www.mcgraw-hill.co.uk/tpr
Click here to buy it on www.amazom.com
Click here to but it on www.amazon.co.uk
An important announcement was made on the Oracle Data Mining discussion forum last night and I haven’t seen anything on twitter about it yet (but maybe I missed it). It was about some ODM Repository migration issues that you might encounter with using ODM in SQL Developer 4.0.1 and using the Oracle Database 220.127.116.11.
Make sure you have a full backup of your ODM schema and the repository before you perform your ODM repository upgrade.
As most people are still on Oracle 11g then this is a potential problem that most of you maybe facing.
I had a a repository migration issues last September during Oracle Open World. EA2 was release and in my eagerness to upgrade (and because I was writing my book on it) I had an issue where my repository go dropped and a new repository created. But nothing was migrated over to the new repository.
Guess what? I lost all my work. I was at OOW and my back ups were back home in Ireland. So you can imagine how I felt.