oraclebigdata

ore.parallel

Posted on

In ORE there are a number ways to get you R scripts to run in parallel in the database. One way is to enable the Parallel option in ORE. This is what will be shown in this post. There are other methods of running various ORE commands/scripts in parallel. With these the scripts are divided out and several parallel R processes are started on the server.

But what if you want to use the database parallel feature on some of your ORE other commands?

Why would you want to do this?

Well the main answer is that you might want to use the parallel option of the database for the creation on objects (tables etc) and for selecting and manipulating the data in the database.

How can you enable your ORE connection to use the in-database parallel feature?

ORE 1.4 has a new option that enables the parallel option for your ORE connection in the database. This option is called ore.parallel.

When you enable or set the ore.parallel option, it seems to be the equivalent of running the following:

ALTER SESSION ENABLE PARALLEL DDL;

ALTER SESSION ENABLE PARALLEL DML;

ALTER SESSION ENABLE PARALLEL QUERY;

The exact details is a little unclear, but it seems to be above commands.

The following commands illustrates some options for using the ore.parallel option.

> #

> # Check to see if the ore.parallel is enabled for your ORE connection

> options(“ore.parallel”)

$ore.parallel

NULL

The NULL returned value tells us that your ORE connections does not have the Parallel option enabled. If the schema had Parallel enabled by default then we would have have a response of TRUE.

The following command turns on the Parallel option for your ORE connection / schema.

> options(“ore.parallel” = TRUE)

> options(“ore.parallel”)

$ore.parallel

[1] TRUE

When the Parallel option is enabled (TRUE above) the database will use the degree of parallel that is set as default for the schema or the degree of parallel that is defined for the table when it is being used in your ORE commands.

You can changed the degree of parallelism by passing the required degree as a value to the ore.parallel command. In the following, the degree of parallelism is set to 8. We then as ORE what the degree is set to and it tells us that it is 8. So it was set correctly.

> options(“ore.parallel” = 8)

> options(“ore.parallel”)

$ore.parallel

[1] 8

Oracle BigDataLite version 2.5.1 is now available

Posted on

Back at the end of January Oracle finally go round to releasing the updated version of the Oracle BigDataLite virtual machine. Check out my previous blog post of this.

Yesterday (27th March) I say on Facebook that a new updated versions of the BigDataLite VM was released. I must have missed the tweet and other publicity on this somewhere 😦

This is a great VM that allows you to play with the various Big Data technologies without the hassle of going through the who install and configuration thing.

If you are interested in this then here are the details of what it contains and where you can find more details.

The following components are included on Oracle Big Data Lite Virtual Machine v 2.5:

Oracle Enterprise Linux 6.4

Oracle Database 12c Release 1 Enterprise Edition (12.1.0.1)

Cloudera’s Distribution including Apache Hadoop (CDH4.6)

Cloudera Manager 4.8.2

Cloudera Enterprise Technology, including:

   Cloudera RTQ (Impala 1.2.3)

   Cloudera RTS (Search 1.2)

Oracle Big Data Connectors 2.5

   Oracle SQL Connector for HDFS 2.3.0

   Oracle Loader for Hadoop 2.3.1

   Oracle Data Integrator 11g

   Oracle R Advanced Analytics for Hadoop 2.3.1

   Oracle XQuery for Hadoop 2.4.0

Oracle NoSQL Database Enterprise Edition 12cR1 (2.1.54)

Oracle JDeveloper 11g

Oracle SQL Developer 4.0

Oracle Data Integrator 12cR1/

Oracle R Distribution 3.0.1

Go to the Oracle Big Data Lite Virtual Machine landing page on OTN to download the latest release.

Association Rules in ODM-Part 3

Posted on

This is a the third part of a four part blog post on building and using Association Rules in Oracle Data Miner. The following outlines the contents of each post in the series on Association Rules

  1. This first part will focus on how to building an Association Rule model
  2. The second post will be on examining the Association Rules produced by ODM – This blog post
  3. The third post will focus on using the Association Rules on your data.
  4. The final post will look at how you can do some of the above steps using the ODM SQL and PL/SQL functions.

In my previous posts I showed how you can go about setting up for Association Rule analysis in Oracle Data Miner and how to examine the rules that are generated.

This post will focus on how we can extract and use these rules in Oracle Data Miner.

Step 1 – Model Details

Association Rules are an unsupervised method of data mining. In Oracle Data Miner we cannot use the Apply node to to score new data. What we have to do is to generate the Model Details. These in turn can then be used.

The Model Details node is used when we do unsupervised learning to extract the rules that are generated.

To do this we need to click on the Model Details node in the Models section of the Component Palette and then click on our workspace, just to the right of the Association Rule node.

The Edit Model Selection window will open. Connect the Association Rule node to the Model Details node. Then Run the node. This will then generate the Association Rules in a format what we can reuse.

image

When you get the small green tick on the Model Details node you can then view what was generated.

Right click on the Model Details node and click on View Details from the menu.

image

The output is similar to what we would have seen under the Association Rule node with the addition of a few more attributes that include the schema name and model name.

We can order the rules based on the Confidence level by double clicking on the Confidence column header. You might need to do this twice to get the rule appearing based on a descending confidence value.

At this point we can no look at persisting the Association Rules. See step 2 below.

We can also view the SQL that was used to generate the Association Rules that we see in the Model Details node. While still viewing the rules, click on the SQL tab.

image

Step 2 – Persisting the Association Rules

To make the rules persist and be useable outside of ODM we can persist the Association Rules in a table. The first step to do this is to create a new Table Node. This can be found under the Data section of the Component Palette. Click this Create Table or View node in the component palette and then click on the workspace, just to the right of the Model Details node.

Connect the Model Details node to the Output node, by right clicking on the Model Details node, select Connect from the menu and then click on the Output Node.

We can now edit the format of the Output i.e. specify what attributes are to be in our Output table. Double click on the Output node or right click and select Edit from the menu. We now get the Edit Create Table or View Node.

SNAGHTML18801036

We can give the output a meaningful name e.g. AR_OUTPUT_RULES. We can also specify what rule properties we can to export to attributes in out table.

We will need to un-tick the Auto Input Columns Selection tick box before we can remove any of the output attributes. In my case I only want to have ANTECENDENT_ITEMS, CONSEQUENT_ITEMS, ID, LENGTH, CONFIDENCE and SUPPORT in my out put. So I need to select and highlight all the other attributes (holding the control button). After selecting all the attributes I do not want included in the final output table, I need to click on the red X icon.

SNAGHTML18859128

When complete click on the OK button to go back to the workflow.

To generate the table right click on the AR_OUTPUT_RULES node and select Run from the menu. When you get the green tick mark on the AR_OUTPUT_RULES node the table has been created with records containing the details of each rules.

image

To view the contents of the AR_OUTPUT_RULES table we can right click on this node and select view data from the menu.

image

We can now use these rules in our applications.

 

Check out the next post in the series (Part 4) where we will look at the functionality available in the ODM SQL & PL/SQL functions to perform Association Rule analysis.

Association Rules in ODM–Part 2

Posted on

This is a the second part of a four part blog post on building and using Association Rules in Oracle Data Miner.  The following outlines the contents of each post in the series on Association Rules

  1. This first part will focus on how to building an Association Rule model
  2. The second post will be on examining the Association Rules produced by ODM – This blog post
  3. The third post will focus on using the Association Rules on your data.
  4. The final post will look at how you can do some of the above steps using the ODM SQL and PL/SQL functions. 

In the previous post I looked at the steps needed to setup a data source and to setup the Association Rule node. When everything was setup we ran the workflow.

Step 1 – Viewing the Model

We the workflow has finished running we will have the green tick marks on each node. This is where we left thing at the end of the previous post (Part 1). To view the model details, right click on the Association Role Node and select View Models from the menu.

image

There are 3 main concepts that are important in relation to Association Rules:

  • Support: is the proportion of transactions in the data set that contain the item set i.e. the number of times the rule occurs
  • Confidence: is the proportion of the occurrences of the antecedent that result in the consequent e.g. how many times do we get C when we have A and B  {A, B} => C
  • Lift: indicates the strength of a rule over the random co-occurrence of the antecedent and the consequent

Support and Confidence are the primary measures that are used to access the usefulness of an association rule.

In our example we can see that the the antecedent and the consequent has numbers separated by the word AND. These numbers correspond to the product numbers.

Step 2 – Examining the Model Rules

To read the antecedent and the consequent for the first rule in our example we have:

Antecedent: 137 AND 143 AND 128

Consequent: 144

To read this association rule we would say that if a Customer bought product 137 and product 143 and product 128, then we have a Confidence value of almost 71%. This is a strong association.

We can check the ordering of the rules by changing the Sort By criteria. As Confidence and Support are the main ways to evaluate the rules, we can change the Sort By criteria to be Confidence. Then click on the Query button to refresh the rules section.

image

Here get a list of the strongest rules listed in descending order.

Below the section of the screen that has the Rules, we have the Rule Details section.

image 

Here we can see that the rule gets formatted into an IF statement. The first rule in the list has a confidence of almost 97%. As it is a simple IF statement it can be easily implemented in our applications.

We want use the information that these rules provides in a number of ways. One such consequence of these rules is that we can look at improving the ordering and distribution of these products to ensure that we have sufficient numbers of each. Another consequence is that we can enhance the front end selling mechanism to make sure that if a customer is buying product 114, 118 and 115 then we can remind the customer of product 119. We can also ensure that all these products are not located beside each other, so that the customer will have to walk past many other products in order to find them. That is why we never see milk and bread beside each other in a grocery store.

Step 3 – Applying Filters to the Model Rules

In the previous step we were able to sort our rules based on some of the measures of our Association Rules and to see how these rules are structured.

Association Rule Analysis can generate many thousands of possible rules for a small data set. In some cases the similar rules can appear and we can have lots of rules that occur so infrequently that they are perhaps meaningless.

ODM provides us with a number of filters that we can apply to the rules that enables use to look for the rules that are of must interest to use. We can access these filters by clicking on the More button, that is located just under the Query button.

We can refine our query on the rules based on the various measures and the number if items in the rule. In addition to this we can also filter based on the values of the items. This is particularly useful if we want to concentrate on specific items (in our example Products). To illustrate this use focus on the rules that involve Product 115. Click on the green + symbol on the right hand side of the window. Select 115 from the list provided. Next we need to decide if we want Product 115 involved in the Antecedent or the Consequent. In our example select the Consequent. This is located to the bottom right of the window. Then click the OK button and then click on the Query button to update the list of rules that correspond with the new filter.

image

We can see that we only have rules that have Product 115 in the Consequent column.

We can also see that we have 134 rules for this scenarios out of a total of 20,988 (your results might differ slightly to mine and that’s OK. It really depends on what version of the sample data you are using)

 

Check out the next post in the series (Part 3) where we will look at how you can use the Association Rules produced by ODM.

Oracle Advanced Analytics Option in Oracle 12c

Posted on

At Oracle Open World a few weeks ago there was a large number of presentations on Big Data and Analytics.  Most of these were marketing type presentations, with a couple of presentations on using R and how it can not be integrated into the Oracle Database 11.2.

In addition this these there was one presentation that focused on the Oracle Advanced Analytics (OAA) Option.

The Oracle Advanced Analytics Option covers the Oracle Data Mining features and the Oracle R Enterprise features in the Database.

The purpose of this blog post is to outline and summarise what was mentioned at these presentations, and will include what changes are/may be coming in the “Next Release” of the database i.e. Oracle 12c.

Health Warning: As with all the presentations at OOW that talked about what may be in or may be in the next release, there is not guarantee that the features will actually be in the release version of the database. Here is the slide that gives the Safe Harbor statement.

image

  • 12c will come with R embedded into it. So there will be no need for any configurations.
  • Oracle R client will come as part of the server install.
  • Oracle R client will be able to use the Analytics functions that exist in the database.
  • Will be able to run R code in the database.
  • The database (12c) will be able to spawn multiple R engines.
  • Will be able to emulate map-reduce style algorithms.
  • There will be new PREDICTION function, replacing the existing (11g) functionality. This will combine a number of steps of building a model and applying it to the data to be scored into one function.  But we will still need the functionality of the existing PREDICTION function that is in 11g. So it will be interesting to see how this functionality will be kept in addition to the new functionality being proposed in 12c.
  • Although the Oracle Data Miner tool will still exits and will have many new features. It was also referred to as the ‘OAA Workflow’.  So those this indicate a potential name change?  We will have to wait and see.
  • Oracle Data Miner will come with a new additional graphing feature. This will be in addition to the Explore Node and will allow us to produce more typical attribute related graphs. From what I could see these would be similar to the type of box plot, scatter, bar chart, etc. graphs that you can get from R.
  • There will be a number of new algorithms too, including a useful One Class Support Vector Machine. This can be used when we have a data set with just one class value. This algorithm will work out what records/cases are more important and others.
  • There will be a new SQL node. This will allow us to write our own data transformation code.
  • There will be a new node to allow the calling of R code.
  • The tool also comes with a slightly modified layout and colour scheme.

Again, the points that I have given above are just my observations. They may or may not appear in 12c, or maybe I misunderstood what was being said.

It certainly looks like we will have a integrate analytics environment in 12c with full integration of R and the ODM in-database features.

Extracting the rules from an ODM Decision Tree model

Posted on

One of the most interesting of important aspects of a Decision Model is that we as a user can get to see what rules the machine learning algorithm has generated for our data.

I’ve give a number of examples in various blog posts over the past few years on how to generate a number of classification models. An example of the workflow is below.

SNAGHTML207172c9

In the Class Build node we get four models being generated. These include a Generalised Linear Model, Support Vector Machine, Naive Bayes and a Decision Tree model.

We can explore the Decision Tree model by right clicking on the Class Build Node, selecting View Models and then the Decision Tree model, which will be labelled with a ‘DT’ in the name.

image

As we explore the nodes and branches of the Decision Tree we can see the rule that was generated for a node in the lower pane of the applications. So by clicking on each node we get a different rule appearing in this pane

image

Sometimes there is a need to extract this rules so that they can be presented to a number of different types of users, to explain to them what is going on.

How can we extract the Decision Tree rules?

To do this, you will need to complete the following steps:

  • From the Models section of the Component Palette select the Model Details node.
  • Click on the Workflow pane and the Model Details node will be created
  • Connect the Class Build node to the Model Details node. To do this right click on the Class Build node and select Connect. Then move the mouse to the Model Details node and click. The two nodes should now be connected.
  • Edit the Model Details node, uncheck the Auto Settings, select Model Type to be Decision Tree, Output to be Full Tree and all the columns.

SNAGHTML2093297b

  • Run the Model Details node. Right click on the node and select run. When complete you you will have the little green box with a tick mark, on the top right hand corner.
  • To view the details produced, right click on the Model Details node and select View Data
  • The rules for each node will now be displayed. You will need to scroll to the right of this pane to get to the rules and you will need to expand the columns for the rules to see the full details

image

My Presentations on Oracle Advanced Analytics Option

Posted on

I’ve recently compiled my list of presentation on the Oracle Analytics Option. All these presentations are for a 45 minute period.

I have two versions of the presentation ‘How to do Data Mining in SQL & PL/SQL’, one is for 45 minutes and the second version is for 2 hour.

I have given most of these presentations at conferences or SIGS.

Let me know if you are interesting in having one of these presentations at your SIG or conference.

  • Oracle Analytics Option – 12c New Features – available 2013
  • Real-time prediction in SQL & Oracle Analytics Option – Using the 12c PREDICTION function – available 2013
  • How to do Data Mining in SQL & PL/SQL
  • From BIG Data to Small Data and Everything in Between
  • Oracle R Enterprise : How to get started
  • Oracle Analytics Option : R vs Oracle Data Mining
  • Building Predictive Analysts into your Forms Applications
  • Getting Real Business Value from OBIEE and Oracle Data Mining  (This is a cut down and merged version of the follow two presentations)
  • Getting Real Business Value from OBIEE and Oracle Data Mining – Part 1 : The Oracle Data Miner part
  • Getting Real Business Value from OBIEE and Oracle Data Mining – Part 2 : The OBIEE part
  • How to Deploying and Using your Oracle Data Miner Models in Production
  • Oracle Analytics Option 101
  • From SQL Programmer to Data Scientist: evolving roles of an Oracle programmer
  • Using an Oracle Oracle Data Mining Model in SQL & PL/SQL
  • Getting Started with Oracle Data Mining
  • You don’t need a PhD to do Data Mining

Check out the ‘My Presentations’ page for updates on new presentations.