Oracle Advanced Analytics
Download R : http://www.r-project.org/
R installation instructions : http://star-www.st-andrews.ac.uk/cran/
This blog post will look at how you can use the Regression feature in Oracle Data Miner (ODM) to predict the lean/tilt of the Leaning Tower of Pisa in the future.
This is a well know regression exercise, and it typically comes with a set of know values and the year for these values. There are lots of websites that contain the details of the problem. A summary of it is:
The following table gives measurements for the years 1975-1985 of the “lean” of the Leaning Tower of Pisa. The variable “lean” represents the difference between where a point on the tower would be if the tower were straight and where it actually is. The data is coded as tenths of a millimetre in excess of 2.9 meters, so that the 1975 lean, which was 2.9642.
Given the lean for the years 1975 to 1985, can you calculate the lean for a future date like 200, 2009, 2012.
Step 1 – Create the table
Connect to a schema that you have setup for use with Oracle Data Miner. Create a table (PISA) with 2 attributes, YEAR_MEASURED and TILT. Both of these attributes need to have the datatype of NUMBER, as ODM will ignore any of the attributes if they are a VARCHAR or you might get an error.
CREATE TABLE PISA
Step 2 – Insert the data
There are 2 sets of data that need to be inserted into this table. The first is the data from 1975 to 1985 with the known values of the lean/tilt of the tower. The second set of data is the future years where we do not know the lean/tilt and we want ODM to calculate the value based on the Regression model we want to create.
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (1975,2.9642);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (1976,2.9644);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (1977,2.9656);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (1978,2.9667);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (1979,2.9673);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (1980,2.9688);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (1981,2.9696);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (1982,2.9698);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (1983,2.9713);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (1984,2.9717);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (1985,2.9725);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (1986,2.9742);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (1987,2.9757);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (1988,null);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (1989,null);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (1990,null);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (1995,null);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (2000,null);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (2005,null);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (2010,null);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (2009,null);
Step 3 – Start ODM and Prepare the data
Open SQL Developer and open the ODM Connections tab. Connect to the schema that you have created the PISA table in. Create a new Project or use an existing one and create a new Workflow for your PISA ODM work.
Create a Data Source node in the workspace and assign the PISA table to it. You can select all the attributes..
The table contains the data that we need to build our regression model (our training data set) and the data that we will use for predicting the future lean/tilt (our apply data set).
We need to apply a filter to the PISA data source to only look at the training data set. Select the Filter Rows node and drag it to the workspace. Connect the PISA data source to the Filter Rows note. Double click on the Filter Row node and select the Expression Builder icon. Create the where clause to select only the rows where we know the lean/tilt.
Step 4 – Create the Regression model
Select the Regression Node from the Models component palette and drop it onto your workspace. Connect the Filter Rows node to the Regression Build Node.
Double click on the Regression Build node and set the Target to the TILT variable. You can leave the Case ID at . You can also select if you want to build a GLM or SVM regression model or both of them. Set the AUTO check box to unchecked. By doing this Oracle will not try to do any data processing or attribute elimination.
You are now ready to create your regression models.
To do this right click the Regression Build node and select Run. When everything is finished you will get a little green tick on the top right hand corner of each node.
Step 5 – Predict the Lean/Tilt for future years
The PISA table that we used above, also contains our apply data set
We need to create a new Filter Rows node on our workspace. This will be used to only look at the rows in PISA where TILT is null. Connect the PISA data source node to the new filter node and edit the expression builder.
Next we need to create the Apply Node. This allows us to run the Regression model(s) against our Apply data set. Connect the second Filter Rows node to the Apply Node and the Regression Build node to the Apply Node.
Double click on the Apply Node. Under the Apply Columns we can see that we will have 4 attributes created in the output. 3 of these attributes will be for the GLM model and 1 will be for the SVM model.
Click on the Data Columns tab and edit the data columns so that we get the YEAR_MEASURED attribute to appear in the final output.
Now run the Apply node by right clicking on it and selecting Run.
Step 6 – Viewing the results
Where we get the little green tick on the Apply node we know that everything has run and completed successfully.
To view the predictions right click on the Apply Node and select View Data from the menu.
We can see the the GLM mode gives the results we would expect but the SVM does not.
Last week I managed to get my hands on the training material for the 2 Day Oracle Data Miner course. This course is run by Oracle University.
Many thanks to Michael O’Callaghan who is a BI Sales person here in Ireland and Oracle University, for arranging this.
The 2 days are pretty packed with a mixture of lecture type material, lots of hands on exercises and some time for open discussions. In particular, day 2 will be very busy day.
Check out the course outline and published schedule – click here
You can have this course on site at your organisation. If this is something that interests you then contact your Oracle University account manager. There is also the traditional face-to-face delivery and the newer online delivery, where people from around the world come together for the online class.
There are a number of Oracle Advanced Analytics and related topics taking place this week at COLLABORATE12 in Las Vegas (http://collaborate12.com).
|Sun 22nd||9:00-3pm||Oracle Business Intelligence Application Journey|
|Mon 23rd||9:45-10:45||Managing Unstructured Data using Hadoop, Oracle 11g and Oracle Exadata Database Machine||Jim Steiner|
|Mon 23rd||9:45-10:45||Environmental Data Management and Analytics-a Real World Perspective||Angela Miller|
|Mon 23rd||11-12||Public Safety and Environmental Real-Time Analytics using Oracle Business Intelligence||Raghav Venkat
|Mon 23rd||11-12||BI is more than slice and dice||Peter Scott|
|Mon 23rd||14:30-15:30||In-Database Analytics: Predictive Analytics, Data Mining, Exadata & Business Intelligence||Jacek Myczkowski|
|Mon 23rd||15:45-16:45||Big Data Analytics, R you ready||Mark Hornick
|Tues 24th||10:45-11:45||BI Analytics and Oracle NoSQL. The Future of Now||Manish Khera|
|Wed. 25th||8:15-9:15||Oracle Data Mining – A Component of the Oracle Advanced Analytics Option-Hands-on Lab||Charlie Berger|
|Wed 25th||9:30-10:30||Oracle R Enterprise – A Component of the Oracle Advanced Analytics Option-Hands-on Lab||Mark Hornick|
Here are the abstracts from the two main Oracle Advanced Analytics presentations by Charlie Berger and Mark Hornick
Oracle Data Mining – A Component of the Oracle Advanced Analytics Option
This Hands-on Lab provides an introduction to Oracle Data Mining and the Oracle Data Miner GUI.
Oracle Data Mining (ODM), now part of Oracle Advanced Analytics, provides an extensive set of in-database data mining algorithms that solve a wide range of business problems. It can predict customer behavior, detect fraud, analyze market baskets, segment customers, and mine text to extract sentiments. ODM provides powerful data mining algorithms that run as native SQL functions for in-database model building and model deployment. There is no need for the time delays and security risks of data movement.
The free Oracle Data Miner GUI is an extension to Oracle SQL Developer 3.1 that enables data analysts to work directly with data inside the database, explore the data graphically, build and evaluate multiple data mining models, apply ODM models to new data, and deploy ODM’s predictions and insights throughout the enterprise. Oracle Data Miner work flows capture and document the user’s analytical methodology and can be saved and shared with others to automate advanced analytical methodologies.
Oracle R – A component of the Oracle Advanced Analytics Option
This Hands-on Lab provides an introduction to Oracle R Enterprise.
Oracle R Enterprise, a part of the Oracle Advanced Analytics Option, makes the open source R statistical programming language and environment ready for the enterprise by integrating R with Oracle Database. R users can interactively and transparently execute R scripts for statistical and graphical analyses on data stored in Oracle Database. R scripts can be executed in Oracle Database using potentially multiple database-managed R engines – resulting in data parallel execution. ORE also provides a rich set of statistical functions and advanced analytics techniques.
In this lab, attendees will be introduced to Oracle’s strategy for R, including the Oracle R Distribution, Oracle R Enterprise (ORE), and Oracle R Connector for Hadoop (ORCH). We will focus on Oracle R Enterprise with hands-on exercises exploring the transparency layer, embedded R execution, and statistics engine.
Charlie Berger (Sr. Director Product Management, Data Mining & Advanced Analytics) as produced a video based on a recent presentation called ‘Oracle Advanced Analytics: Oracle R Enterprise & Oracle Data Mining’.
This is a 1 hour video, including some demos, of product background, product features, recent developments and new additions, examples of how Oracle is including Oracle Data Mining into their fusion applications, etc.
Oracle has 2 data mining products, with main in-database Oracle Data Mining and the more recent extensions to R to give us Oracle R Enterprise.
Check out the video – Click here.
Check out Charlie’s blog at https://blogs.oracle.com/datamining/
Oracle University : 2 Day Oracle Data Mining training course
Oracle R Enterprise (ORE) was officially launched over the past couple of days and it has been receiving a lot of interest in the press.
We now have the Oracle Advanced Analytics (OAA) option which comprises, the already existing, Oracle Data Mining and now Oracle R Enterprise. In addition to the Oracle Advanced Analytics option we also 2 free set of tools available to use to use. The first of these free tools are the statistical functions which are available in all versions of the Oracle Database and the second free tool is the Oracle Data Miner tool that is part of the newly released SQL Developer 3.1 (7th Feb).
What has Oracle done to Oracle to make Oracle R Enterprise ?
The one of the main challenges with using R is that it is memory constrained, resulting in the amount of data that it can process. So the ORE development team have worked ensuring R can work transparently with data within the database. This removes the need extract the data from the database before it can be used by R. We still get all the advanced on in-Database Data Mining.
They have also embedded R functions within the database, so we an run R code on data within the database. By having these functions with the database, this allows R to use the database parallelism and so we get quicker execution of our code. Most R implementation are constrained to being able to process dataset containing 100Ks of records. With ORE we can now process 10M+ records
In addition to the ORE functions and algorithms that are embedded in the database we can also use the R code to call the suite of data mining algorithms that already exist as part of Oracle Data Miner.
For more details of what Oracle R Enterprise is all about check out the following links.