R « Ora-lytics

Pulling Large Database tables in R

Posted on January 16, 2015

As the volume of the data in your tables grows, particularly in the big data world, you may run into some memory issues or package restrictions with pulling down the tables to your R environment.

Some of the R packages and drivers have some recommended numbers or limits for the number of records that can be fetched.

Caveate: My laptop is a Mac and at this point in time the ROracle package is unavailable for a Mac. It is for Windows, Solaris and AIX.

In the following example I’m looking at downloading a table with 300K records from an Oracle Database. I’ve already setup my DB connection using the Oracle JDBC driver. But when I run the following command I get an error.

> res<-dbSendQuery(jdbcConnection, "select * from my_large_table")

> dbFetch(res)

Error in .jcall(rp, “I”, “fetch”, stride) :

java.lang.OutOfMemoryError: Java heap space

I also get a similar error if I run the following command.

> train_data <- dbReadTable(jdbcConnection, "MY_LARGE_TABLE")

How can you pull down a large table in R? So that you are not restricted to memory restrictions or limits on the number of records.

One way to do this is to loop through the data, pull the records down in chunks (a certain fetch size), put these into an array, and then merge them all together into a data frame. The following code illustrates how to do this.

> res<-dbSendQuery(jdbcConnection, "select * from my_large_table")

> dbFetch(res)

> rm(result)

> result<-list()

> i=1

> result[[i]]<-dbFetch(res,n=1000)

> while(nrow(chunk 0){

+ i<-i+1

+ result[[i]]<-chunk

+ }

> train_data<-do.call(rbind,result)

The above code runs surprisingly quickly, generate no errors and I now have all the data I need in my R environment.

The fetch size in the above example is set to 1000. This is a bit small really and is only set to that for illustration purposes here. You will need to play with this size to find out what size works best for your environment.

As with all programming languages and with R too there can be many different ways of performing the same thing.

This entry was posted in Oracle, R.

Creating a Word Cloud of Oracle’s OAA webpages in R

Posted on January 10, 2015

The following is not something new but something that I have put together this evening, and I mainly make the following available as a note to myself and what I did. If you find it useful or interesting then you are more than welcome to use and share. You will also find lots of similar solutions on the web.

This evening I was playing around the the Text Mining ™ package in R. So I decided to create a Word Cloud of the Advanced Analytics webpages on Oracle.com. These webpages contain the Overview webpage for the Advanced Analytics webpage, the Oracle Data Mining webpages and the Oracle R Enterprise webpages.

I’ve broken the R code into a number of sections.

1. Setup

The first thing that you need to do is to install four R packages these are “tm”, “wordcloud” , “Curl” and “XML”. The first two of these packages are needed for the main part of the Text processing and generating the word cloud. The last two of these packages are needed by the function “htmlToText”. You can download the htmlToText function on github.

install.packages (c ( “tm”, “wordcloud”, “RCurl”, “XML”, “SnowballC”)) # install ‘tm” package

library ™

library (wordcloud)

library (SnowballC)

# load htmlToText

source(“/Users/brendan.tierney/htmltotext.R”)

2. Read in the Oracle Advanced Analytics webpages using the htmlToText function

data1 <- htmlToText("http://www.oracle.com/technetwork/database/options/advanced-analytics/overview/index.html")

data2 <- htmlToText("http://www.oracle.com/technetwork/database/options/advanced-analytics/odm/index.html")

data3 <- htmlToText("http://www.oracle.com/technetwork/database/database-technologies/r/r-technologies/overview/index.html")

data4 <- htmlToText("http://www.oracle.com/technetwork/database/database-technologies/r/r-enterprise/overview/index.html")

You will need to combine each of these webpages into one for processing in later steps.

data <- c(data1, data2)

data <- c(data, data3)

data <- c(data, data4)

3. Convert into a Corpus and perfom Data Cleaning & Transformations

To convert our web documents into a Corpus.

txt_corpus <- Corpus (VectorSource (data)) # create a corpus

We can use the summary function to get some of the details of the Corpus. We can see that we have 4 documents in the corpus.

> summary(txt_corpus)

A corpus with 4 text documents

The metadata consists of 2 tag-value pairs and a data frame

Available tags are:

create_date creator

Available variables in the data frame are:

MetaID

Remove the White Space in these documents

tm_map <- tm_map (txt_corpus, stripWhitespace) # remove white space

Remove the Punctuations from the documents

tm_map <- tm_map (tm_map, removePunctuation) # remove punctuations

Remove number from the documents

tm_map <- tm_map (tm_map, removeNumbers) # to remove numbers

Remove the typical list of Stop Words

tm_map <- tm_map (tm_map, removeWords, stopwords("english")) # to remove stop words(like ‘as’ ‘the’ etc….)

Apply stemming to the documents

If needed you can also apply stemming on your data. I decided to not perform this as it seemed to trunc some of the words in the word cloud.

# tm_map <- tm_map (tm_map, stemDocument)

If you do want to perform stemming then just remove the # symbol.

Remove any addition words (would could add other words to this list)

tm_map <- tm_map (tm_map, removeWords, c("work", "use", "java", "new", "support"))

If you want to have a look at the output of each of the above commands you can use the inspect function.

inspect(tm_map)

4. Convert into a Text Document Matrix and Sort

Matrix <- TermDocumentMatrix(tm_map) # terms in rows

matrix_c <- as.matrix (Matrix)

freq <- sort (rowSums (matrix_c)) # frequency data

freq #to view the words and their frequencies

5. Generate the Word Cloud

tmdata <- data.frame (words=names(freq), freq)

wordcloud (tmdata$words, tmdata$freq, max.words=100, min.freq=3, scale=c(7,.5), random.order=FALSE, colors=brewer.pal(8, “Dark2”))

and the World Clould will look something like the following. Everything you generate the Word Cloud you will get a slightly different layout of the words.

OAA Word Cloud

This entry was posted in R.

ore.parallel

Posted on June 2, 2014

In ORE there are a number ways to get you R scripts to run in parallel in the database. One way is to enable the Parallel option in ORE. This is what will be shown in this post. There are other methods of running various ORE commands/scripts in parallel. With these the scripts are divided out and several parallel R processes are started on the server.

But what if you want to use the database parallel feature on some of your ORE other commands?

Why would you want to do this?

Well the main answer is that you might want to use the parallel option of the database for the creation on objects (tables etc) and for selecting and manipulating the data in the database.

How can you enable your ORE connection to use the in-database parallel feature?

ORE 1.4 has a new option that enables the parallel option for your ORE connection in the database. This option is called ore.parallel.

When you enable or set the ore.parallel option, it seems to be the equivalent of running the following:

ALTER SESSION ENABLE PARALLEL DDL;

ALTER SESSION ENABLE PARALLEL DML;

ALTER SESSION ENABLE PARALLEL QUERY;

The exact details is a little unclear, but it seems to be above commands.

The following commands illustrates some options for using the ore.parallel option.

> #

> # Check to see if the ore.parallel is enabled for your ORE connection

> options(“ore.parallel”)

$ore.parallel

NULL

The NULL returned value tells us that your ORE connections does not have the Parallel option enabled. If the schema had Parallel enabled by default then we would have have a response of TRUE.

The following command turns on the Parallel option for your ORE connection / schema.

> options(“ore.parallel” = TRUE)

> options(“ore.parallel”)

$ore.parallel

[1] TRUE

When the Parallel option is enabled (TRUE above) the database will use the degree of parallel that is set as default for the schema or the degree of parallel that is defined for the table when it is being used in your ORE commands.

You can changed the degree of parallelism by passing the required degree as a value to the ore.parallel command. In the following, the degree of parallelism is set to 8. We then as ORE what the degree is set to and it tells us that it is 8. So it was set correctly.

> options(“ore.parallel” = 8)

> options(“ore.parallel”)

$ore.parallel

[1] 8

This entry was posted in Oracle Analytics Option, Oracle R Enterprise, oraclebigdata, ORE, R.

Oracle R Enterprise (ORE) Tasks for the Oracle DBA

Posted on May 26, 2014

In previous posts I gave the steps required to install Oracle R Enterprise on your Database server and your client machine.

One of the steps that I gave was the initial set of Database privileges that the DB needed to give to the RQUSER. The RQUSER is a little bit like the SCOTT/TIGER schema in the Oracle Database. Setting up the RQUSER as part of the installation process allows you to test that you can connect to the database using ORE and that you can issue some ORE commands.

After the initial testing of the ORE install you might consider locking this RQUSER schema or dropping it from the Database.

So when a new ORE user wants access to the database what steps does the DBA have to perform.

Create a new schema for the user
Grant the new schema the standard set of privileges to connect to the DB, create objects, etc.
Create any data sets in their schema
Create any views to data that exists in other schemas (and grant the necessary privileges, etc

Now we get onto the ORE specific privileges. The following are the minimum required for your user to be able to connect to their Oracle schema using ORE.

GRANT CREATE TABLE TO RQUSER;

GRANT CREATE PROCEDURE TO RQUSER;

GRANT CREATE VIEW TO RQUSER;

GRANT CREATE MINING MODEL TO RQUSER;

In most cases the first 3 privileges (TABLE, PROCEDURE and VIEW) will be standard for most schemas that you will set up. So in reality the only command or extra privilege that you will need to execute is:

GRANT CREATE MINING MODEL TO RQUSER;

This command will allow the user to connect to their Oracle schema using ORE, but what it will not allow them to do is to create any embedded R. These are R scripts that are stored in the database and can be called in their R/ORE scripts or by using the SQL API to R (I’ll have more blog posts on these soon). To allow the user to create and use embedded R the DBA will also have to grant the following privilege as SYS:

GRANT RQADMIN to RQUSER;

To summarise the DBA will have to grant the following to each schema that wants to use the full power of ORE.

GRANT CREATE MINING MODEL TO RQUSER;

GRANT RQADMIN to RQUSER;

A note of Warning: Be careful what schemas you grant the RQADMIN privilege to. It is a powerful privilege and opens the database to the powerful features of R. So using the typical DBA best practice of granting privileges, the DBA should only grant the RQADMIN privilege to only the people who require it.

This entry was posted in Oracle Analytics Option, oracle big data, Oracle R Enterprise, ORE, R.

Installing ORE – Part C – Issue installing ORE on Windows Server

Posted on April 29, 2014

In my previous two blog posts (Part-A and Part-B) I detailed 4 steps for how you can install ORE on your servers and on your client machines.

I also mentioned a possible issue you may encounter if you try to install ORE on a Windows server. This blog post will look at this issue and how you can workaround it and get ORE installed.

The problem occurs when I when to install the ORE Supporting packages.

I was prompted to install these into a new library directory. If you get this error message then something is wrong and you should not proceed with installing these packages. If you do proceed and install them in a new library directory then they will not be seen by ORE and the database (as they were not installed in the $ORACLE_HOME/R/library) and when you go to run ORE from within R you will get errors like the following

package ‘Cairo’ successfully unpacked and MD5 sums checked

package ‘DBI’ successfully unpacked and MD5 sums checked

package ‘png’ successfully unpacked and MD5 sums checked

Warning: cannot remove prior installation of package ‘png’

package ‘ROracle’ successfully unpacked and MD5 sums checked

Warning: cannot remove prior installation of package ‘ROracle’

If I try the ore.connect I get the following errors.

ore.connect(user=”RQUSER”, sid=”orcl”, host=”localhost”, password=”RQUSER”, port=1521, all=TRUE)

Loading required package: ROracle

Error in .ore.oracleQuerySetup() :

ORACLE connection requires ROracle package

In addition: Warning message:

In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, : there is no package called ‘ROracle’

To overcome this ORE install issue all you need to do is to close down your R Gui, then add the following lines to the Rprofile file. The Rprofile file is located in R\etc directory C:\Program Files\R\R-3.0.1\etc. Add the following lines:

# Add $ORACLE_HOME/R/library to .libPaths() for ORE packages

.libPaths(“C:/app/oracle/product/11.2.0/dbhome_1/R/library”)

The above line will tell R to look in or to include the R directory in the Oracle home as part of its search path. You many need to change the directory above to point to your Oracle home. When you log into the R Gui the path above will be included. Now you can install the packages and then import the packages. This time they will be installed in the $ORACLE_HOME/R/library.

When you open the R Gui and run the command to load the ORE package and to connect to your ORE schema you should not receive any error messages.

> library(ORE)

> ore.connect(user=”RQUSER”, sid=”orcl”, host=”localhost”, password=”RQUSER”, port=1521, all=TRUE)

Now you should have ORE installed and working on your Windows server.

This entry was posted in Oracle Analytics Option, Oracle R Enterprise, ORE, R.

Installing ORE – Part B

Posted on April 24, 2014

This is the second part of a two part blog post on installing ORE.

In reality there are 3 blog posts on installing ORE. The third and next blog post will be on a particular issue you might encounter on a Windows server and how you can over come the issue.

In the previous blog post I outlined the steps needed to install ORE on the database server and on the client machine. Click here to go to this post.

In this blog post I will show you how to setup a schema for ORE and how to get connected to the schema using ORE.

Step 3 : Setting up your Schema to use ORE / Tasks for your DBA

On the server when you unzipped the ORE download, you will find a demo_user.bat script (something similar like demo_user.sh on Linux).

After the script has performed some checks, you will be asked do you want to create a demo schema. Enter yes for this task to be completed and the RQUSER schema will be created in your schema. Then enter the password for the RQUSER.

The RQUSER can as a small set of system privileges that allow it to connect to and perform some functions on the database. This include:

GRANT CREATE TABLE TO RQUSER;

GRANT CREATE PROCEDURE TO RQUSER;

GRANT CREATE VIEW TO RQUSER;

GRANT CREATE MINING MODEL TO RQUSER;

NOTE: If you cannot connect to the database using the RQUSER and the password you set, then you might need to also grant CONNECT and RESOURCE to it too.

For every schema that you want to access using ORE you will need to grant the above to them.

In addition to these grants, if you want a schema to be able to create and drop R scripts in the database then you will need to grant them the addition role of RQADMIN.

sqlplus / AS SYSDBA

GRANT RQADMIN to RQUSER;

NB: You will need to grant RQADMIN to an schema where you want to use the embedded ORE in the database.

Step 4 : Connecting to the Database

If you have complete all of the above steps you are now ready to use ORE to connect to your database. The following is an example of the ore.connect command that you can use. It is assuming the RQUSER has the password RQUSER, and the the host is on the local machine (localhost). Replace localhost with the host name of your database server and also change the SID to that of your database.

ore.connect(user=”rquser”, sid=”orcl”, host=”localhost”, password=”rquser”, port=1521, all=TRUE);

If you get no errors and you get the R prompt back then you are connected to the RQUSER schema in your database.

To test that the connection was made you can run the following ORE command and then list the tables in the schema.

> ore.is.connected()

[1] TRUE

> ore.ls()

character(0)

The output of the last line above tells us that we do not have any tables in our RQUSER schema. I will have more blog posts on how you can use ORE and perform various ORE analytics in future posts.

There are a series of demonstrations that come with ORE. To access these type in the following command which will list the available ORE demos.

> demo(package=”ORE”)

The following command illustrates how you can run the ORE demo called basic.

> demo(basic, package=”ORE”)

Also check out the Part C blog post on how to resolve a potential install issue on a Windows server.

This entry was posted in data mining, Oracle Analytics Option, Oracle R Enterprise, ORE, R.

The ORE Packages

Posted on April 6, 2014

If you are interested in using ORE or just to get an idea of what does ORE give you that does not already exist in one of the other R packages then the table below lists the packages that come as part of ORE.

Before you can use then you will need to load these into your workspace. To do this you can issue the following command from the R prompt or from the prompt in RStudio.

> library(ORE)

RStudio is my preferred R interface and is widely used around the world. table,th,td { border:1px solid black; border-collapse:collapse }

ORE Installed Packages	Description
ORE	Oracle R Enterprise
OREbase	ORE – base
OREdm	The ORE functions that use the in-database Oracle Data Miner algorithms
OREeda	The ORE functions used for exploratory data analysis
OREgraphics	The ORE functions used for graphics
OREpredict	The ORE functions used for model predictions
OREstats	The ORE stats functions
ORExml	The ORE functions that convert R objects to XML
DBI	R Database Interface
ROracle	OCI based Oracle database interface for R
XML	Tools for parsing and generating XML within R and S-Plus.
bitops	Functions for Bitwise operations
png	Read and write PNG images

In addition to these core ORE packages, ORE also uses some R packages as part of the core ORE packages listed above. The following table lists the R packages that are used in the ORE packages. So make sure you have these packages installed. They should have come with your installation of R, but if something has happened then you can download them again.

table,th,td { border:1px solid black; border-collapse:collapse }

R Packages used by ORE	Description
base	The R Base Package
boot	Bootstrap Functions (originally by Angelo Canty for S)
class	Functions for Classification
cluster	Cluster Analysis Extended Rousseeuw et al
codetools	Code Analysis Tools for R
compiler	The R Compiler Package
datasets	The R Datasets Package
foreign	Read Data Stored by Minitab, S, SAS, SPSS, Stata, Systat, dBase, ..
graphics	The R Graphics Package
grDevices	The R Graphics Devices and Support for Colours and Fonts
grid	The Grid Graphics Package
KernSmooth	Functions for kernel smoothing for Wand & Jones (1995)
lattice	Lattice Graphics
MASS	Support Functions and Datasets for Venables and Ripley’s MASS
Matrix	Sparse and Dense Matrix Classes and Methods
methods	Formal Methods and Classes
mgcv	GAMs with GCV/AIC/REML smoothness estimation and GAMMs by PQL
nlme	Linear and Nonlinear Mixed Effects Models

I’ve been using R a lot over the past few years and I’ve had a number of projects involving R particularly over the past 12 month. I just found out that I will now have another short duration R project in May and June.

So watch out for lots more blog posts on R and ORE. Plus the usual blog posts on using Oracle Data Mining. ORE and Oracle Data Mining are very closely linked.

This entry was posted in Data Science, Oracle Analytics Option, ORE, R.

Predicting using ORE package

Posted on March 26, 2014

In a previous post I gave a an overview of the various in-database data mining algorithms that you can use in your Oracle R Enterprise scripts.

To create data mining models based on those algorithms you need to use the ore.odm functions.

After you have developed and tested your models you will select one of these to score your new data.

How can you do this using ORE? There is a suite of ORE functions called ore.predict that you can use to apply your data mining model to score or label new data.

The following table lists the ore.predict functions:

table,th,td { border:1px solid black; border-collapse:collapse }

ORE Predict Function	Description
ore.predict-glm	Generalized linear model
ore.predict-kmeans	k-Means clustering mode
ore.predict-lm	Linear regression model
ore.predict-matrix	A matrix with no more than 1000 rows
ore.predict-multinom	Multinomial log-linear model
ore.predict-nnet	Neural network models
ore.predict-ore.model	An Oracle R Enterprise model
ore.predict-prcomp	Principal components analysis on a matrix
ore.predict-princomp	Principal components analysis on a numeric matrix
ore.predict-rpart	Recursive partitioning and regression tree model

As you will see from the above table there are more ore.predict functions than there are ore.odm functions. The reason for this is that ORE comes with some additional data mining algorithms. These are in addition to the sub-set of Oracle Data Mining algorithms that it uses. These include the ore.glm, ore.lm, ore.neural and ore.stepwise.

You also need to watch out for the data mining algorithms that are not used in prediction. These include the Minimum Description Length, Apriori and Non-Negative Matrix Factorization.

Remember that these ore.predict functions are run inside the Oracle Database. No data is extracted to the data analyst laptop or desktop. All the data stays in the database. The ORE functions are run in the database on the data in the database

This entry was posted in data mining, Oracle Analytics Option, Oracle R Enterprise, ORE, R.

Using the in-database ODM algorithms in ORE

Posted on March 23, 2014

Oracle R Enterprise is the version of R that Oracle has that runs in the database instead of on your laptop or desktop.

Oracle already has a significant number of data mining algorithms in the database. With ORE they have exposed these so that they can be easily called from your R (ORE) scripts.

To access these in-database data mining algorithms you will need to use the ore.odm package.

ORE is continually being developed with new functionality being added all the time. Over the past 2 years Oracle have released and updated version of ORE about every 6 months. ORE is generally not certified with the latest version of R. But is slightly behind but only a point or two of the current release. For example the current version of ORE 1.4 (released only last week) is certified for R version 3.0.1. But the current release of R is 3.0.3.

Will ORE work with the latest version of R? The simple answer is maybe or in theory it should, but is not certified.

Let’s get back to ore.dm. The following table maps the ore.odm functions to the in-database Oracle Data Mining functions.

ORE Function	Oracle Data Mining Algorithm	What Algorithm can be used for
ore.odmAI	Minimum Description Length	Attribute Importance
ore.odmAssocRules	Apriori	Association Rules
ore.odmDT	Decision Tree	Classification
ore.odmGLM	Generalized Linear Model	Classification and Regression
ore.odmKMeans	k-Means	Clustering
ore.odmNB	Naïve Bayes	Classification
ore.odmNMF	Non-Negative Matrix Factorization	Feature Extraction
ore.odmOC	O-Cluster	Clustering
ore.odmSVM	Support Vector Machines	Classification and Regression

table,th,td { border:1px solid black; border-collapse:collapse }

As you can see we only have a subset of the in-database Oracle Dat Miner algorithms. This is a pity really, but I’m sure as we get newer releases of ORE these will be added.

This entry was posted in data mining, Data Science, Oracle R Enterprise, ORE, R.

ORE 1.4 New Parallel feature

Posted on March 16, 2014

Oracle R Enterprise (ORE) 1.4 has just been released and can downloaded from here. Remember there is a client and server side install required and ORE 1.4 is certified against R 3.0.1 and the Oracle R Distribution

ORE

One of the interesting new features is the PARALLEL option. You can set this to significantly improve the performance of your R server side code by using the PARALLEL database option. You can set the degree of PARALLEL at a global level in your code by using the ore.parallel setting.

The default setting for this ore.parallel setting is FALSE or 1. Otherwise it must be set to a minimum of 2 of more to enable the Parallel database option.

Alternatively you can set the ore.parallel setting to TRUE to use the default degree of parallelism that is set for the database object or set to NULL to use the default database setting

You will also be able to set the degree of parallel (DOP) using the parallel enabled functions ore.groupApply, ore.rowApply and ore.indexApply.

They have also made available or as they say exposed some more of the in-database Oracle Data Mining algorithms. These include the ODM algorithms for Association rules (ore.odmAssocRules), the feature extraction algorithm called Non-Negative Matrix Factorization (NMF) (ore.odmNMF) and the ODM Clustering algorithm O-Cluster (ore.odmOC)

Watch out of some blog posts on these over the coming weeks.

Check out the OTN page for the R Technologies from Oracle

This entry was posted in Oracle Analytics Option, Oracle R Enterprise, ORE, R.

BIWA Summit–9th & 10th January, 2013

Posted on December 18, 2012

The BIWA Summit will be on the 9th and 10th January, 2013. It is being held in the Sofitel Hotel beside the Oracle HQ at Redwood Shores, just outside of San Francisco.

The BIWA Summit looks to be leading event in 2013 focused on Analytics, Data Warehousing, Big Data and BI. If you are a data architect or a data scientist this is certainly one event that you should consider attending in 2013.

All the big names (in the Oracle world) will be there Tom Kyte, Mark Rittman, Maria Colgan, Balaji Yelmanchili, Vaishnavi Sashikanth, Charlie Berger, Mark Hornick, Karl Rexter, Tim and Dan Vlamis.

Oh and then there is me. I’ll be giving a presentation on the Oracle Data Scientist. This will be on the first day of the event (9th) at 11:20am.

For anyone interest in the Oracle Data Scientist World there are lots of presentations to help you get start and up to speed in this area. Here is a list of presentations and hands on labs that I can recommend.

As is typical with all good conferences there are many presentations on at the same time that I would like to attend. If only I could time travel.

This is a great event to start off the new year and for everyone who is thinking of moving into or commencing a project in the area. So get asking you manager to see if there is any training budget left for 2012 or get first dibs on the training budget for 2013.

Registration is open and at the moment the early bird discount still seems to be available. You can also book a room in the hotel using the registration page.

To view the full agenda – click here

This entry was posted in BIWA, Brendan Tierney, Conference, data mining, database, Oracle, Oracle Advanced Analytics, Oracle Data Miner, Oracle R Enterprise, oug_ire, R.

Oracle Advanced Analytics Option in Oracle 12c

Posted on October 20, 2012

At Oracle Open World a few weeks ago there was a large number of presentations on Big Data and Analytics. Most of these were marketing type presentations, with a couple of presentations on using R and how it can not be integrated into the Oracle Database 11.2.

In addition this these there was one presentation that focused on the Oracle Advanced Analytics (OAA) Option.

The Oracle Advanced Analytics Option covers the Oracle Data Mining features and the Oracle R Enterprise features in the Database.

The purpose of this blog post is to outline and summarise what was mentioned at these presentations, and will include what changes are/may be coming in the “Next Release” of the database i.e. Oracle 12c.

Health Warning: As with all the presentations at OOW that talked about what may be in or may be in the next release, there is not guarantee that the features will actually be in the release version of the database. Here is the slide that gives the Safe Harbor statement.

12c will come with R embedded into it. So there will be no need for any configurations.
Oracle R client will come as part of the server install.
Oracle R client will be able to use the Analytics functions that exist in the database.
Will be able to run R code in the database.
The database (12c) will be able to spawn multiple R engines.
Will be able to emulate map-reduce style algorithms.
There will be new PREDICTION function, replacing the existing (11g) functionality. This will combine a number of steps of building a model and applying it to the data to be scored into one function. But we will still need the functionality of the existing PREDICTION function that is in 11g. So it will be interesting to see how this functionality will be kept in addition to the new functionality being proposed in 12c.
Although the Oracle Data Miner tool will still exits and will have many new features. It was also referred to as the ‘OAA Workflow’. So those this indicate a potential name change? We will have to wait and see.
Oracle Data Miner will come with a new additional graphing feature. This will be in addition to the Explore Node and will allow us to produce more typical attribute related graphs. From what I could see these would be similar to the type of box plot, scatter, bar chart, etc. graphs that you can get from R.
There will be a number of new algorithms too, including a useful One Class Support Vector Machine. This can be used when we have a data set with just one class value. This algorithm will work out what records/cases are more important and others.
There will be a new SQL node. This will allow us to write our own data transformation code.
There will be a new node to allow the calling of R code.
The tool also comes with a slightly modified layout and colour scheme.

Again, the points that I have given above are just my observations. They may or may not appear in 12c, or maybe I misunderstood what was being said.

It certainly looks like we will have a integrate analytics environment in 12c with full integration of R and the ODM in-database features.

This entry was posted in Brendan Tierney, data mining, data mining blog, ODM 11g R2, Oracle, Oracle Advanced Analytics, Oracle Analytics Option, oracle big data, Oracle Data Miner, oracle data mining, Oracle Data Mining 11g R2, Oracle R Enterprise, oraclebigdata, ORE, oug_ire, R.

R

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: