oraclebigdata
ore.parallel
In ORE there are a number ways to get you R scripts to run in parallel in the database. One way is to enable the Parallel option in ORE. This is what will be shown in this post. There are other methods of running various ORE commands/scripts in parallel. With these the scripts are divided out and several parallel R processes are started on the server.
But what if you want to use the database parallel feature on some of your ORE other commands?
Why would you want to do this?
Well the main answer is that you might want to use the parallel option of the database for the creation on objects (tables etc) and for selecting and manipulating the data in the database.
How can you enable your ORE connection to use the in-database parallel feature?
ORE 1.4 has a new option that enables the parallel option for your ORE connection in the database. This option is called ore.parallel.
When you enable or set the ore.parallel option, it seems to be the equivalent of running the following:
ALTER SESSION ENABLE PARALLEL DDL;
ALTER SESSION ENABLE PARALLEL DML;
ALTER SESSION ENABLE PARALLEL QUERY;
The exact details is a little unclear, but it seems to be above commands.
The following commands illustrates some options for using the ore.parallel option.
> #
> # Check to see if the ore.parallel is enabled for your ORE connection
> options(“ore.parallel”)
$ore.parallel
NULL
The NULL returned value tells us that your ORE connections does not have the Parallel option enabled. If the schema had Parallel enabled by default then we would have have a response of TRUE.
The following command turns on the Parallel option for your ORE connection / schema.
> options(“ore.parallel” = TRUE)
> options(“ore.parallel”)
$ore.parallel
[1] TRUE
When the Parallel option is enabled (TRUE above) the database will use the degree of parallel that is set as default for the schema or the degree of parallel that is defined for the table when it is being used in your ORE commands.
You can changed the degree of parallelism by passing the required degree as a value to the ore.parallel command. In the following, the degree of parallelism is set to 8. We then as ORE what the degree is set to and it tells us that it is 8. So it was set correctly.
> options(“ore.parallel” = 8)
> options(“ore.parallel”)
$ore.parallel
[1] 8
Oracle BigDataLite version 2.5.1 is now available
Back at the end of January Oracle finally go round to releasing the updated version of the Oracle BigDataLite virtual machine. Check out my previous blog post of this.
Yesterday (27th March) I say on Facebook that a new updated versions of the BigDataLite VM was released. I must have missed the tweet and other publicity on this somewhere 😦
This is a great VM that allows you to play with the various Big Data technologies without the hassle of going through the who install and configuration thing.
If you are interested in this then here are the details of what it contains and where you can find more details.
The following components are included on Oracle Big Data Lite Virtual Machine v 2.5:
Oracle Enterprise Linux 6.4
Oracle Database 12c Release 1 Enterprise Edition (12.1.0.1)
Cloudera’s Distribution including Apache Hadoop (CDH4.6)
Cloudera Manager 4.8.2
Cloudera Enterprise Technology, including:
Cloudera RTQ (Impala 1.2.3)
Cloudera RTS (Search 1.2)
Oracle Big Data Connectors 2.5
Oracle SQL Connector for HDFS 2.3.0
Oracle Loader for Hadoop 2.3.1
Oracle Data Integrator 11g
Oracle R Advanced Analytics for Hadoop 2.3.1
Oracle XQuery for Hadoop 2.4.0
Oracle NoSQL Database Enterprise Edition 12cR1 (2.1.54)
Oracle JDeveloper 11g
Oracle SQL Developer 4.0
Oracle Data Integrator 12cR1/
Oracle R Distribution 3.0.1
Go to the Oracle Big Data Lite Virtual Machine landing page on OTN to download the latest release.
Association Rules in ODM-Part 3
This is a the third part of a four part blog post on building and using Association Rules in Oracle Data Miner. The following outlines the contents of each post in the series on Association Rules
- This first part will focus on how to building an Association Rule model
- The second post will be on examining the Association Rules produced by ODM – This blog post
- The third post will focus on using the Association Rules on your data.
- The final post will look at how you can do some of the above steps using the ODM SQL and PL/SQL functions.
In my previous posts I showed how you can go about setting up for Association Rule analysis in Oracle Data Miner and how to examine the rules that are generated.
This post will focus on how we can extract and use these rules in Oracle Data Miner.
Step 1 – Model Details
Association Rules are an unsupervised method of data mining. In Oracle Data Miner we cannot use the Apply node to to score new data. What we have to do is to generate the Model Details. These in turn can then be used.
The Model Details node is used when we do unsupervised learning to extract the rules that are generated.
To do this we need to click on the Model Details node in the Models section of the Component Palette and then click on our workspace, just to the right of the Association Rule node.
The Edit Model Selection window will open. Connect the Association Rule node to the Model Details node. Then Run the node. This will then generate the Association Rules in a format what we can reuse.
When you get the small green tick on the Model Details node you can then view what was generated.
Right click on the Model Details node and click on View Details from the menu.
The output is similar to what we would have seen under the Association Rule node with the addition of a few more attributes that include the schema name and model name.
We can order the rules based on the Confidence level by double clicking on the Confidence column header. You might need to do this twice to get the rule appearing based on a descending confidence value.
At this point we can no look at persisting the Association Rules. See step 2 below.
We can also view the SQL that was used to generate the Association Rules that we see in the Model Details node. While still viewing the rules, click on the SQL tab.
Step 2 – Persisting the Association Rules
To make the rules persist and be useable outside of ODM we can persist the Association Rules in a table. The first step to do this is to create a new Table Node. This can be found under the Data section of the Component Palette. Click this Create Table or View node in the component palette and then click on the workspace, just to the right of the Model Details node.
Connect the Model Details node to the Output node, by right clicking on the Model Details node, select Connect from the menu and then click on the Output Node.
We can now edit the format of the Output i.e. specify what attributes are to be in our Output table. Double click on the Output node or right click and select Edit from the menu. We now get the Edit Create Table or View Node.
We can give the output a meaningful name e.g. AR_OUTPUT_RULES. We can also specify what rule properties we can to export to attributes in out table.
We will need to un-tick the Auto Input Columns Selection tick box before we can remove any of the output attributes. In my case I only want to have ANTECENDENT_ITEMS, CONSEQUENT_ITEMS, ID, LENGTH, CONFIDENCE and SUPPORT in my out put. So I need to select and highlight all the other attributes (holding the control button). After selecting all the attributes I do not want included in the final output table, I need to click on the red X icon.
When complete click on the OK button to go back to the workflow.
To generate the table right click on the AR_OUTPUT_RULES node and select Run from the menu. When you get the green tick mark on the AR_OUTPUT_RULES node the table has been created with records containing the details of each rules.
To view the contents of the AR_OUTPUT_RULES table we can right click on this node and select view data from the menu.
We can now use these rules in our applications.
Check out the next post in the series (Part 4) where we will look at the functionality available in the ODM SQL & PL/SQL functions to perform Association Rule analysis.
Association Rules in ODM–Part 2
This is a the second part of a four part blog post on building and using Association Rules in Oracle Data Miner. The following outlines the contents of each post in the series on Association Rules
- This first part will focus on how to building an Association Rule model
- The second post will be on examining the Association Rules produced by ODM – This blog post
- The third post will focus on using the Association Rules on your data.
- The final post will look at how you can do some of the above steps using the ODM SQL and PL/SQL functions.
In the previous post I looked at the steps needed to setup a data source and to setup the Association Rule node. When everything was setup we ran the workflow.
Step 1 – Viewing the Model
We the workflow has finished running we will have the green tick marks on each node. This is where we left thing at the end of the previous post (Part 1). To view the model details, right click on the Association Role Node and select View Models from the menu.
There are 3 main concepts that are important in relation to Association Rules:
- Support: is the proportion of transactions in the data set that contain the item set i.e. the number of times the rule occurs
- Confidence: is the proportion of the occurrences of the antecedent that result in the consequent e.g. how many times do we get C when we have A and B {A, B} => C
- Lift: indicates the strength of a rule over the random co-occurrence of the antecedent and the consequent
Support and Confidence are the primary measures that are used to access the usefulness of an association rule.
In our example we can see that the the antecedent and the consequent has numbers separated by the word AND. These numbers correspond to the product numbers.
Step 2 – Examining the Model Rules
To read the antecedent and the consequent for the first rule in our example we have:
Antecedent: 137 AND 143 AND 128
Consequent: 144
To read this association rule we would say that if a Customer bought product 137 and product 143 and product 128, then we have a Confidence value of almost 71%. This is a strong association.
We can check the ordering of the rules by changing the Sort By criteria. As Confidence and Support are the main ways to evaluate the rules, we can change the Sort By criteria to be Confidence. Then click on the Query button to refresh the rules section.
Here get a list of the strongest rules listed in descending order.
Below the section of the screen that has the Rules, we have the Rule Details section.
Here we can see that the rule gets formatted into an IF statement. The first rule in the list has a confidence of almost 97%. As it is a simple IF statement it can be easily implemented in our applications.
We want use the information that these rules provides in a number of ways. One such consequence of these rules is that we can look at improving the ordering and distribution of these products to ensure that we have sufficient numbers of each. Another consequence is that we can enhance the front end selling mechanism to make sure that if a customer is buying product 114, 118 and 115 then we can remind the customer of product 119. We can also ensure that all these products are not located beside each other, so that the customer will have to walk past many other products in order to find them. That is why we never see milk and bread beside each other in a grocery store.
Step 3 – Applying Filters to the Model Rules
In the previous step we were able to sort our rules based on some of the measures of our Association Rules and to see how these rules are structured.
Association Rule Analysis can generate many thousands of possible rules for a small data set. In some cases the similar rules can appear and we can have lots of rules that occur so infrequently that they are perhaps meaningless.
ODM provides us with a number of filters that we can apply to the rules that enables use to look for the rules that are of must interest to use. We can access these filters by clicking on the More button, that is located just under the Query button.
We can refine our query on the rules based on the various measures and the number if items in the rule. In addition to this we can also filter based on the values of the items. This is particularly useful if we want to concentrate on specific items (in our example Products). To illustrate this use focus on the rules that involve Product 115. Click on the green + symbol on the right hand side of the window. Select 115 from the list provided. Next we need to decide if we want Product 115 involved in the Antecedent or the Consequent. In our example select the Consequent. This is located to the bottom right of the window. Then click the OK button and then click on the Query button to update the list of rules that correspond with the new filter.
We can see that we only have rules that have Product 115 in the Consequent column.
We can also see that we have 134 rules for this scenarios out of a total of 20,988 (your results might differ slightly to mine and that’s OK. It really depends on what version of the sample data you are using)
Check out the next post in the series (Part 3) where we will look at how you can use the Association Rules produced by ODM.
Oracle Advanced Analytics Option in Oracle 12c
At Oracle Open World a few weeks ago there was a large number of presentations on Big Data and Analytics. Most of these were marketing type presentations, with a couple of presentations on using R and how it can not be integrated into the Oracle Database 11.2.
In addition this these there was one presentation that focused on the Oracle Advanced Analytics (OAA) Option.
The Oracle Advanced Analytics Option covers the Oracle Data Mining features and the Oracle R Enterprise features in the Database.
The purpose of this blog post is to outline and summarise what was mentioned at these presentations, and will include what changes are/may be coming in the “Next Release” of the database i.e. Oracle 12c.
Health Warning: As with all the presentations at OOW that talked about what may be in or may be in the next release, there is not guarantee that the features will actually be in the release version of the database. Here is the slide that gives the Safe Harbor statement.
- 12c will come with R embedded into it. So there will be no need for any configurations.
- Oracle R client will come as part of the server install.
- Oracle R client will be able to use the Analytics functions that exist in the database.
- Will be able to run R code in the database.
- The database (12c) will be able to spawn multiple R engines.
- Will be able to emulate map-reduce style algorithms.
- There will be new PREDICTION function, replacing the existing (11g) functionality. This will combine a number of steps of building a model and applying it to the data to be scored into one function. But we will still need the functionality of the existing PREDICTION function that is in 11g. So it will be interesting to see how this functionality will be kept in addition to the new functionality being proposed in 12c.
- Although the Oracle Data Miner tool will still exits and will have many new features. It was also referred to as the ‘OAA Workflow’. So those this indicate a potential name change? We will have to wait and see.
- Oracle Data Miner will come with a new additional graphing feature. This will be in addition to the Explore Node and will allow us to produce more typical attribute related graphs. From what I could see these would be similar to the type of box plot, scatter, bar chart, etc. graphs that you can get from R.
- There will be a number of new algorithms too, including a useful One Class Support Vector Machine. This can be used when we have a data set with just one class value. This algorithm will work out what records/cases are more important and others.
- There will be a new SQL node. This will allow us to write our own data transformation code.
- There will be a new node to allow the calling of R code.
- The tool also comes with a slightly modified layout and colour scheme.
Again, the points that I have given above are just my observations. They may or may not appear in 12c, or maybe I misunderstood what was being said.
It certainly looks like we will have a integrate analytics environment in 12c with full integration of R and the ODM in-database features.
Extracting the rules from an ODM Decision Tree model
One of the most interesting of important aspects of a Decision Model is that we as a user can get to see what rules the machine learning algorithm has generated for our data.
I’ve give a number of examples in various blog posts over the past few years on how to generate a number of classification models. An example of the workflow is below.
In the Class Build node we get four models being generated. These include a Generalised Linear Model, Support Vector Machine, Naive Bayes and a Decision Tree model.
We can explore the Decision Tree model by right clicking on the Class Build Node, selecting View Models and then the Decision Tree model, which will be labelled with a ‘DT’ in the name.
As we explore the nodes and branches of the Decision Tree we can see the rule that was generated for a node in the lower pane of the applications. So by clicking on each node we get a different rule appearing in this pane
Sometimes there is a need to extract this rules so that they can be presented to a number of different types of users, to explain to them what is going on.
How can we extract the Decision Tree rules?
To do this, you will need to complete the following steps:
- From the Models section of the Component Palette select the Model Details node.
- Click on the Workflow pane and the Model Details node will be created
- Connect the Class Build node to the Model Details node. To do this right click on the Class Build node and select Connect. Then move the mouse to the Model Details node and click. The two nodes should now be connected.
- Edit the Model Details node, uncheck the Auto Settings, select Model Type to be Decision Tree, Output to be Full Tree and all the columns.
- Run the Model Details node. Right click on the node and select run. When complete you you will have the little green box with a tick mark, on the top right hand corner.
- To view the details produced, right click on the Model Details node and select View Data
- The rules for each node will now be displayed. You will need to scroll to the right of this pane to get to the rules and you will need to expand the columns for the rules to see the full details
My Presentations on Oracle Advanced Analytics Option
I’ve recently compiled my list of presentation on the Oracle Analytics Option. All these presentations are for a 45 minute period.
I have two versions of the presentation ‘How to do Data Mining in SQL & PL/SQL’, one is for 45 minutes and the second version is for 2 hour.
I have given most of these presentations at conferences or SIGS.
Let me know if you are interesting in having one of these presentations at your SIG or conference.
- Oracle Analytics Option – 12c New Features – available 2013
- Real-time prediction in SQL & Oracle Analytics Option – Using the 12c PREDICTION function – available 2013
- How to do Data Mining in SQL & PL/SQL
- From BIG Data to Small Data and Everything in Between
- Oracle R Enterprise : How to get started
- Oracle Analytics Option : R vs Oracle Data Mining
- Building Predictive Analysts into your Forms Applications
- Getting Real Business Value from OBIEE and Oracle Data Mining (This is a cut down and merged version of the follow two presentations)
- Getting Real Business Value from OBIEE and Oracle Data Mining – Part 1 : The Oracle Data Miner part
- Getting Real Business Value from OBIEE and Oracle Data Mining – Part 2 : The OBIEE part
- How to Deploying and Using your Oracle Data Miner Models in Production
- Oracle Analytics Option 101
- From SQL Programmer to Data Scientist: evolving roles of an Oracle programmer
- Using an Oracle Oracle Data Mining Model in SQL & PL/SQL
- Getting Started with Oracle Data Mining
- You don’t need a PhD to do Data Mining
Check out the ‘My Presentations’ page for updates on new presentations.
Big Data videos by Oracle
Here are the links to the 2 different sets of Big Data videos that Oracle have produced over the past 12 months
Oracle Big Data Videos – Version 1
Episode 2 – Gold Mine or Just Stuff
Episode 4 – Everything You Always Wanted to Know
Oracle Big Data Videos – Version 2
Episode 1 – Overview for the Boss
Episode 3 – Acquiring Big Data
Episode 4 – Organising Big Data
Episode 5 – Analysing Big Data
Other videos include
Analytics Sessions at Oracle Open World 2012
The content catalog for Oracle Open World 2012 was made public during the week. OOW is on between 30th September and 4th October.
The following table gives a list of most of the Data Analytics type sessions that are currently scheduled.
Why did I pick these sessions? If I was able to go to OOW then these are the sessions I would like to attend. Yes there would be many more sessions I would like to attend on the core DB technology and Development streams.
Session Title | Presenters |
CON6640 – Database Data Mining: Practical Enterprise R and Oracle Advanced Analytics | Husnu Sensoy |
CON8688 – Customer Perspectives: Oracle Data Integrator | Gurcan Orhan – Software Architect & Senior Developer, Turkcell Technology R&D Julien Testut – Product Manager, Oracle |
HOL10089 – Oracle Big Data Analytics and R | George Lumpkin – Vice President, Product Management, Oracle |
CON8655 – Tackling Big Data Analytics with Oracle Data Integrator | Mala Narasimharajan – Senior Product Marketing Manager, Oracle Michael Eisterer – Principal Product Manager, Oracle |
CON8436 – Data Warehousing and Big Data with the Latest Generation of Database Technology | George Lumpkin – Vice President, Product Management, Oracle |
CON8424 – Oracle’s Big Data Platform: Settling the Debate | Martin Gubar – Director, Oracle Kuassi Mensah – Director Product Management, Oracle |
CON8423 – Finding Gold in Your Data Warehouse: Oracle Advanced Analytics | Charles Berger – Senior Director, Product Management, Data Mining and Advanced Analytics, Oracle |
CON8764 – Analytics for Oracle Fusion Applications: Overview and Strategy | Florian Schouten – Senior Director, Product Management/Strategy, Oracle |
CON8330 – Implementing Big Data Solutions: From Theory to Practice | Josef Pugh – , Oracle |
CON8524 – Oracle TimesTen In-Memory Database for Oracle Exalytics: Overview | Tirthankar Lahiri – Senior Director, Oracle |
CON9510 – Oracle BI Analytics and Reporting: Where to Start? | Mauricio Alvarado – Principal Product Manager, Oracle |
CON8438 – Scalable Statistics and Advanced Analytics: Using R in the Enterprise | Marcos Arancibia Coddou – Product Manager, Oracle Advanced Analytics, Oracle |
CON4951 – Southwestern Energy’s Creation of the Analytical Enterprise | Jim Vick – , Southwestern Energy Richard Solari – Specialist Leader, Deloitte Consulting LLP |
CON8311 – Mining Big Data with Semantic Web Technology: Discovering What You Didn’t Know | Zhe Wu – Consultant Member of Tech Staff, Oracle Xavier Lopez – Director, Product Management, Oracle |
CON8428 – Analyze This! Analytical Power in SQL, More Than You Ever Dreamt Of | Hermann Baer – Director Product Management, Oracle Andrew Witkowski – Architect, Oracle |
CON6143 – Big Data in Financial Services: Technologies, Use Cases, and Implications | Omer Trajman – , Cloudera Ambreesh Khanna – Industry Vice President, Oracle Sunil Mathew – Senior Director, Financial Services Industry Technology, Oracle |
CON8425 – Big Data: The Big Story | Jean-Pierre Dijcks – Sr. Principal Product Manager, Oracle |
CON10327 – Recommendations in R: Scaling from Small to Big Data | Mark Hornick – Senior Manager, Oracle |
R resources
Download R : http://www.r-project.org/
R installation instructions : http://star-www.st-andrews.ac.uk/cran/
R-Uni (A List of 85+ Free R Tutorials and Resources in Universities webpages)
R programming for those coming from other languages
Part 2 of the Leaning Tower of Pisa problem in ODM
In previous post I gave the details of how you can use Regression in Oracle Data Miner to predict/forecast the lean of the tower in future years. This was based on building a regression model in ODM using the known lean/tilt of the tower for a range of years.
In this post I will show you how you can do the same tasks using the Oracle Data Miner functions in SQL and PL/SQL.
Step 1 – Create the table and data
The easiest way to do this is to make a copy of the PISA table we created in the previous blog post. If you haven’t completed this, then go to the blog post and complete step 1 and step 2.
create table PISA_2
as select * from PISA;
Step 2 – Create the ODM Settings table
We need to create a ‘settings’ table before we can use the ODM API’s in PL/SQL. The purpose of this table is to store all the configuration parameters needed for the algorithm to work. In our case we only need to set two parameters.
BEGIN
delete from pisa_2_settings;
INSERT INTO PISA_2_settings (setting_name, setting_value) VALUES
(dbms_data_mining.algo_name, dbms_data_mining.ALGO_GENERALIZED_LINEAR_MODEL);
INSERT INTO PISA_2_settings (setting_name, setting_value) VALUES
(dbms_data_mining.prep_auto,dbms_data_mining.prep_auto_off );
COMMIT;
END;
Step 3 – Build the Regression Model
To build the regression model we need to use the CREATE_MODEL function that is part of the DBMS_DATA_MINING package. When calling this function we need to pass in the name of the model, the algorithm to use, the source data, the setting table and the target column we are interested in.
BEGIN
DBMS_DATA_MINING.CREATE_MODEL(
model_name => ‘PISA_REG_2’,
mining_function => dbms_data_mining.regression,
data_table_name => ‘pisa_2_build_v’,
case_id_column_name => null,
target_column_name => ’tilt’,
settings_table_name => ‘pisa_2_settings’);
END;
After this we should have our regression model.
Step 4 – Query the Regression Model details
To find out what was produced as in the previous step we can query the data dictionary.
SELECT model_name,
mining_function,
algorithm,
build_duration,
model_size
from USER_MINING_MODELS
where model_name like ‘P%’;
select setting_name,
setting_value,
setting_type
from all_mining_model_settings
where model_name like ‘P%’;
Step 5 – Apply the Regression Model to new data
Our final step would be to apply it to our new data i.e. the years that we want to know what the lean/tilt would be.
SELECT year_measured, prediction(pisa_reg_2 using *)
FROM pisa_2_apply_v;
Using ODM Regression for the Leaning Tower of Pisa tilt problem
This blog post will look at how you can use the Regression feature in Oracle Data Miner (ODM) to predict the lean/tilt of the Leaning Tower of Pisa in the future.
This is a well know regression exercise, and it typically comes with a set of know values and the year for these values. There are lots of websites that contain the details of the problem. A summary of it is:
The following table gives measurements for the years 1975-1985 of the “lean” of the Leaning Tower of Pisa. The variable “lean” represents the difference between where a point on the tower would be if the tower were straight and where it actually is. The data is coded as tenths of a millimetre in excess of 2.9 meters, so that the 1975 lean, which was 2.9642.
Given the lean for the years 1975 to 1985, can you calculate the lean for a future date like 200, 2009, 2012.
Step 1 – Create the table
Connect to a schema that you have setup for use with Oracle Data Miner. Create a table (PISA) with 2 attributes, YEAR_MEASURED and TILT. Both of these attributes need to have the datatype of NUMBER, as ODM will ignore any of the attributes if they are a VARCHAR or you might get an error.
CREATE TABLE PISA
(
YEAR_MEASURED NUMBER(4,0),
TILT NUMBER(9,4)
);
Step 2 – Insert the data
There are 2 sets of data that need to be inserted into this table. The first is the data from 1975 to 1985 with the known values of the lean/tilt of the tower. The second set of data is the future years where we do not know the lean/tilt and we want ODM to calculate the value based on the Regression model we want to create.
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (1975,2.9642);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (1976,2.9644);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (1977,2.9656);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (1978,2.9667);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (1979,2.9673);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (1980,2.9688);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (1981,2.9696);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (1982,2.9698);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (1983,2.9713);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (1984,2.9717);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (1985,2.9725);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (1986,2.9742);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (1987,2.9757);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (1988,null);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (1989,null);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (1990,null);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (1995,null);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (2000,null);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (2005,null);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (2010,null);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (2009,null);
Step 3 – Start ODM and Prepare the data
Open SQL Developer and open the ODM Connections tab. Connect to the schema that you have created the PISA table in. Create a new Project or use an existing one and create a new Workflow for your PISA ODM work.
Create a Data Source node in the workspace and assign the PISA table to it. You can select all the attributes..
The table contains the data that we need to build our regression model (our training data set) and the data that we will use for predicting the future lean/tilt (our apply data set).
We need to apply a filter to the PISA data source to only look at the training data set. Select the Filter Rows node and drag it to the workspace. Connect the PISA data source to the Filter Rows note. Double click on the Filter Row node and select the Expression Builder icon. Create the where clause to select only the rows where we know the lean/tilt.
Step 4 – Create the Regression model
Select the Regression Node from the Models component palette and drop it onto your workspace. Connect the Filter Rows node to the Regression Build Node.
Double click on the Regression Build node and set the Target to the TILT variable. You can leave the Case ID at . You can also select if you want to build a GLM or SVM regression model or both of them. Set the AUTO check box to unchecked. By doing this Oracle will not try to do any data processing or attribute elimination.
You are now ready to create your regression models.
To do this right click the Regression Build node and select Run. When everything is finished you will get a little green tick on the top right hand corner of each node.
Step 5 – Predict the Lean/Tilt for future years
The PISA table that we used above, also contains our apply data set
We need to create a new Filter Rows node on our workspace. This will be used to only look at the rows in PISA where TILT is null. Connect the PISA data source node to the new filter node and edit the expression builder.
Next we need to create the Apply Node. This allows us to run the Regression model(s) against our Apply data set. Connect the second Filter Rows node to the Apply Node and the Regression Build node to the Apply Node.
Double click on the Apply Node. Under the Apply Columns we can see that we will have 4 attributes created in the output. 3 of these attributes will be for the GLM model and 1 will be for the SVM model.
Click on the Data Columns tab and edit the data columns so that we get the YEAR_MEASURED attribute to appear in the final output.
Now run the Apply node by right clicking on it and selecting Run.
Step 6 – Viewing the results
Where we get the little green tick on the Apply node we know that everything has run and completed successfully.
To view the predictions right click on the Apply Node and select View Data from the menu.
We can see the the GLM mode gives the results we would expect but the SVM does not.
You must be logged in to post a comment.