Year: 2016
ORE video : Demo Code Part 5
The following is the fifth and final set of demo code from my video on using R in the Oracle Database. Check out the video before using the following code. The blog post for the video will be updated to contain links to all blog posts that have the various demo code.
The following examples illustrate how you can use the Oracle R Enterprise capabilities within SQL and PL/SQL. The following illustrate building a GLM model using the glm algorithm that comes with the R language, and then uses this mode to score or label new data that is stored in a table. The last part of the example illustrates how you can perform What-If analysis using this ORE model
-- Build & save the R script, called Demo_GLM in the DB
-- This builds a GLM DM model in the DB
--
Begin
sys.rqScriptDrop('Demo_GLM');
sys.rqScriptCreate('Demo_GLM',
'function(dat,datastore_name) {
mod <- glm(AFFINITY_CARD ~ CUST_GENDER + AGE + CUST_MARITAL_STATUS + COUNTRY_NAME + CUST_INCOME_LEVEL + EDUCATION + HOUSEHOLD_SIZE + YRS_RESIDENCE, dat, family = binomial())
ore.save(mod, name=datastore_name, overwrite=TRUE) }');
end;
/
--
-- After creating the script you need to run it to create the GLM model
--
select *
from table(rqTableEval(
cursor(select CUST_GENDER,
AGE,
CUST_MARITAL_STATUS,
COUNTRY_NAME,
CUST_INCOME_LEVEL,
EDUCATION,
HOUSEHOLD_SIZE,
YRS_RESIDENCE,
AFFINITY_CARD
from mining_data_build_v),
cursor(select 1 as "ore.connect", 'myDatastore' as "datastore_name” from dual),
'XML', 'Demo_GLM' ));
--
-- There are 2 ways to use the GLM model : in Batch and in Real-Time mode
--
-- First Step : Build the in-database R script to score you new data
--
Begin
sys.rqScriptDrop('Demo_GLM_Batch');
sys.rqScriptCreate('Demo_GLM_Batch',
'function(dat, datastore_name) {
ore.load(datastore_name)
prd <- predict(mod, newdata=dat)
prd[as.integer(rownames(prd))] <- prd
res <- cbind(dat, PRED = prd)
res}');
end;
/
--
-- Now you can run the script to score the new data in Batch model
-- The data is located in the table MINING_DATA_APPLY
--
select * from table(rqTableEval(
cursor(select CUST_GENDER, AGE, CUST_MARITAL_STATUS, COUNTRY_NAME, CUST_INCOME_LEVEL, EDUCATION, HOUSEHOLD_SIZE, YRS_RESIDENCE
from MINING_DATA_APPLY_V
where rownum <= 10),
cursor(select 1 as "ore.connect", 'myDatastore' as "datastore_name" from dual),
'select CUST_GENDER, AGE, CUST_MARITAL_STATUS, COUNTRY_NAME, CUST_INCOME_LEVEL, EDUCATION, HOUSEHOLD_SIZE, YRS_RESIDENCE, 1 PRED from MINING_DATA_APPLY_V','Demo_GLM_Batch'))
order by 1, 2, 3;
--
-- Now let us use the Demo_GLM_Batch script to score data in Real-Time
-- The data values are passed to the GLM model
--
select * from table(rqTableEval(
cursor(select 'M' CUST_GENDER,
23 AGE,
'Married' CUST_MARITAL_STATUS,
'United States of America' COUNTRY_NAME,
'B: 30,000 - 49,999' CUST_INCOME_LEVEL,
'Assoc-A' EDUCATION,
'3' HOUSEHOLD_SIZE,
5 YRS_RESIDENCE
from dual),
cursor(select 'myDatastore' "datastore_name", 1 "ore.connect" from dual),
'select CUST_GENDER, AGE, CUST_MARITAL_STATUS, COUNTRY_NAME, CUST_INCOME_LEVEL, EDUCATION, HOUSEHOLD_SIZE, YRS_RESIDENCE, 1 PRED from MINING_DATA_APPLY',
'Demo_GLM_Batch')) order by 1, 2, 3;
BIWA Summit 2016
The annual BIWA Summit 2016 will be next week from the 26-28 January, and it is back in Oracle HQ at Redwood Shores. If you are into the Oracle Database, Business Intelligence, Big Data, Advanced Analytics, etc then this is the conference for you.
Over the 3 days there is an action packed agenda of 5 parallel tracks, plus a full 3 days of Hands-on Labs. The agenda is filled with the whos-who of the Oracle BI and Analytics world, so if this is your area then BIWA Summit is the conference for you and your training budget. (I’m sure it is not too late to book your place)
I’ve been lucky this year in that I will have 2 Hands-on Labs and 1 presentation to give. Yes that is 5 hours of presenting/hosting to do. The presentation I will be giving is ‘Is Oracle SQL the best language for Statistics?’ (on Tuesday 26th). This presentation is listed for the BIWA Summit and also for the NoCOUG Yes SQL conference that is running at the same time in the same venue (i.e. co-located). I’ve also written a brand new 2-hour Hands-on Lab titled ‘Predictive Analytics using SQL and PL/SQL’. The first outing for this will be on Wednesday 27th. I will also be co-hosting, with Charlie Berger, the ‘Learn Predictive Analytics in 2hours with Oracle Data Miner’ Hands-on Lab on Tuesday 26th.
Come to my Hands-on Labs to be in with chance to win a copy of my book on Oracle Data Mining.
Hopefully I’ll see you there!
Check out the full agenda by clicking on the image below.
BIWA Summit 2016
The annual BIWA Summit 2016 will be next week from the 26-28 January, and it is back in Oracle HQ at Redwood Shores. If you are into the Oracle Database, Business Intelligence, Big Data, Advanced Analytics, etc then this is the conference for you.
Over the 3 days there is an action packed agenda of 5 parallel tracks, plus a full 3 days of Hands-on Labs. The agenda is filled with the whos-who of the Oracle BI and Analytics world, so if this is your area then BIWA Summit is the conference for you and your training budget. (I’m sure it is not too late to book your place)
I’ve been lucky this year in that I will have 2 Hands-on Labs and 1 presentation to give. Yes that is 5 hours of presenting/hosting to do. The presentation I will be giving is ‘Is Oracle SQL the best language for Statistics?’ (on Tuesday 26th). This presentation is listed for the BIWA Summit and also for the NoCOUG Yes SQL conference that is running at the same time in the same venue (i.e. co-located). I’ve also written a brand new 2-hour Hands-on Lab titled ‘Predictive Analytics using SQL and PL/SQL’. The first outing for this will be on Wednesday 27th. I will also be co-hosting, with Charlie Berger, the ‘Learn Predictive Analytics in 2hours with Oracle Data Miner’ Hands-on Lab on Tuesday 26th.
Hopefully I’ll see you there!
Check out the full agenda by clicking on the image below.
ORE video : Demo Code Part 4
The following is the fourth set of demo code from my video on using R in the Oracle Database. Check out the video before using the following code. The blog post for the video will be updated to contain links to all blog posts that have the various demo code.
The following code example illustrate how you can build a Data Mining model using the in-database data mining algorithms. In this example a Decision Tree model is created. This model is then applied to new data, scoring this data with the predicted values.
> #
> # Build am in-database ODM Decision Tree
> #
> dtData # Create a ODM DT model in the DB : Only a temporary model. It is deleted when you logout
> dtModel # View the details of the ODM model
> #summary(dtModel)
> names(dtModel)
[1] "name" "settings" "attributes" "costs" "distributions”
[6] "nodes" "formula" "extRef" "call"
> dtModel$name
[1] "ORE$208_210”
> dtModel$settings
value
prep.auto on
impurity.metric impurity.gini
term.max.depth 7
term.minpct.node 0.05
term.minpct.split 0.1
term.minrec.node 10
term.minrec.split 20
> dtModel$attributes
name type data.type data.length precision scale is.target
1 AFFINITY_CARD categorical number 22 0 0 TRUE
2 AGE numerical number 22 NA NA FALSE
3 CUST_MARITAL_STATUS categorical varchar2 20 NA NA FALSE
4 EDUCATION categorical varchar2 21 NA NA FALSE
5 HOUSEHOLD_SIZE categorical varchar2 21 NA NA FALSE
6 OCCUPATION categorical varchar2 21 NA NA FALSE
7 YRS_RESIDENCE numerical number 22 NA NA FALSE
>
> ## Compute the Compusion Matrix
> dtResults with(dtResults, table(AFFINITY_CARD, PREDICTION))
PREDICTION
AFFINITY_CARD 0 1
0 1056 64
1 201 179
> ## How do you persist the model in the DB
> ## Rename and save the model in the database
> dtModel$name
[1] "ORE$208_210"
> ## Save the ODM model in the in-database R datastore
> ore.save(dtModel, name = "ORE_MODELS", overwrite=TRUE)
> ore.load(name = "ORE_MODELS")
[1] "dtModel"
> ## Score new data using the DM Model
> ore.sync(table = c("MINING_DATA_APPLY"))
> ore.ls()
[1] "DEMO_R_APPLY_RESULT" "DEMO_R_TABLE" "DEMO_SUBSET_TABLE"
[4] "INSUR_CUST_LTV_SAMPLE" "MINING_DATA_APPLY" "MINING_DATA_BUILD_V"
[7] "MINING_DATA_TEST_V" > dtApply dim(dtApply)
[1] 1500 18
> class(dtApply)
[1] "ore.frame”
attr(,"package")
[1] "OREbase”
> DTAPPLY dtApplyResult dtApplyResult head(dtApplyResult)
'0' '1' PREDICTION
100001 0.9521912 0.04780876 0
100002 0.9521912 0.04780876 0
100003 0.9521912 0.04780876 0
100004 0.9521912 0.04780876 0
100005 0.2633745 0.73662551 1
100006 0.9521912 0.04780876 0
> dim(dtApplyResult)
[1] 1500 3
> dim(dtApply)
[1] 1500 18
> dtResults dim(dtResults)
[1] 1500 21
> ore.drop(table = "DEMO_R_APPLY_RESULT")
> ore.create(dtApplyResult, table="DEMO_R_APPLY_RESULT")
> ## Run the following for the first time you will rename a mode
> # ore.exec(paste("BEGIN>
# DBMS_DATA_MINING.RENAME_MODEL(model_name => '", dtModel$name, "',>
# new_model_name => 'DEMO_R_DT_MODEL'); END;",sep=""))>
## Run the following to refresh an existing model
> ore.exec(paste("BEGIN
+ DBMS_DATA_MINING.DROP_MODEL('DEMO_R_DT_MODEL');
+ DBMS_DATA_MINING.RENAME_MODEL(model_name => '", dtModel$name,"',
+ new_model_name => 'DEMO_R_DT_MODEL');
+ END;",sep=""))
ORE video : Demo Code Part 4
The following is the fourth set of demo code from my video on using R in the Oracle Database. Check out the video before using the following code. The blog post for the video will be updated to contain links to all blog posts that have the various demo code.
The following code example illustrate how you can build a Data Mining model using the in-database data mining algorithms. In this example a Decision Tree model is created. This model is then applied to new data, scoring this data with the predicted values.
> #
> # Build am in-database ODM Decision Tree
> #
> dtData # Create a ODM DT model in the DB : Only a temporary model. It is deleted when you logout
> dtModel # View the details of the ODM model
> #summary(dtModel)
> names(dtModel)
[1] "name" "settings" "attributes" "costs" "distributions”
[6] "nodes" "formula" "extRef" "call"
> dtModel$name
[1] "ORE$208_210”
> dtModel$settings
value
prep.auto on
impurity.metric impurity.gini
term.max.depth 7
term.minpct.node 0.05
term.minpct.split 0.1
term.minrec.node 10
term.minrec.split 20
> dtModel$attributes
name type data.type data.length precision scale is.target
1 AFFINITY_CARD categorical number 22 0 0 TRUE
2 AGE numerical number 22 NA NA FALSE
3 CUST_MARITAL_STATUS categorical varchar2 20 NA NA FALSE
4 EDUCATION categorical varchar2 21 NA NA FALSE
5 HOUSEHOLD_SIZE categorical varchar2 21 NA NA FALSE
6 OCCUPATION categorical varchar2 21 NA NA FALSE
7 YRS_RESIDENCE numerical number 22 NA NA FALSE
>
> ## Compute the Compusion Matrix
> dtResults with(dtResults, table(AFFINITY_CARD, PREDICTION))
PREDICTION
AFFINITY_CARD 0 1
0 1056 64
1 201 179
> ## How do you persist the model in the DB
> ## Rename and save the model in the database
> dtModel$name
[1] "ORE$208_210"
> ## Save the ODM model in the in-database R datastore
> ore.save(dtModel, name = "ORE_MODELS", overwrite=TRUE)
> ore.load(name = "ORE_MODELS")
[1] "dtModel"
> ## Score new data using the DM Model
> ore.sync(table = c("MINING_DATA_APPLY"))
> ore.ls()
[1] "DEMO_R_APPLY_RESULT" "DEMO_R_TABLE" "DEMO_SUBSET_TABLE"
[4] "INSUR_CUST_LTV_SAMPLE" "MINING_DATA_APPLY" "MINING_DATA_BUILD_V"
[7] "MINING_DATA_TEST_V" > dtApply dim(dtApply)
[1] 1500 18
> class(dtApply)
[1] "ore.frame”
attr(,"package")
[1] "OREbase”
> DTAPPLY dtApplyResult dtApplyResult head(dtApplyResult)
'0' '1' PREDICTION
100001 0.9521912 0.04780876 0
100002 0.9521912 0.04780876 0
100003 0.9521912 0.04780876 0
100004 0.9521912 0.04780876 0
100005 0.2633745 0.73662551 1
100006 0.9521912 0.04780876 0
> dim(dtApplyResult)
[1] 1500 3
> dim(dtApply)
[1] 1500 18
> dtResults dim(dtResults)
[1] 1500 21
> ore.drop(table = "DEMO_R_APPLY_RESULT")
> ore.create(dtApplyResult, table="DEMO_R_APPLY_RESULT")
> ## Run the following for the first time you will rename a mode
> # ore.exec(paste("BEGIN>
# DBMS_DATA_MINING.RENAME_MODEL(model_name => '", dtModel$name, "',>
# new_model_name => 'DEMO_R_DT_MODEL'); END;",sep=""))>
## Run the following to refresh an existing model
> ore.exec(paste("BEGIN
+ DBMS_DATA_MINING.DROP_MODEL('DEMO_R_DT_MODEL');
+ DBMS_DATA_MINING.RENAME_MODEL(model_name => '", dtModel$name,"',
+ new_model_name => 'DEMO_R_DT_MODEL');
+ END;",sep=""))
Running R in the Oracle Database video
Earlier this year I was asked by the Business Analysics & Big Data SIG (of the UKOUG) to give a presentation on Oracle R Enterprise. Unfortunately I had already committed to giving the same presentation at the OUG Norway conference on the same day.
But then they asked me if I could record a video of the presentation and they would show it at the SIG. The following video is what I recorded.
At the UKOUG annual (2015) conferences I was supposed to give a 2 hour presentation during their Super Sunday event. Unfortunately due to a storm passing over Ireland on the Saturday all flights going to the UK were cancelled. This meant that I would miss my 2 hour presentation.
Instead of trying to find an alternative speaker for my presentation slot at such sort notice, the committee suggested that they would show the video.
Based on the feedback and the people who thanked me in person during the rest of the conference, I’ve decided to make it available to everyone. Hopefully you will find it useful.
The following are the links to the demo code that is shown or referred to in the video.
- Demo Code 1
- Demo Code 2
- Demo Code 3
- Demo Code 4
- Demo Code 5
People have been asking me if the demo scripts I used in video are available. You will probably find some of these on various blog posts. So to make it easier for everyone I will post the demo scripts in one or more blog posts over the coming weeks. When these are available I will update this blog post with the links.
I have a few new presentations on Oracle R Enterprise in 2016 so watch out for these at an Oracle User Group conference.
ORE video : Demo Code Part 3
The following is the third set of demo code from my video on using R in the Oracle Database. Check out the video before using the following code. The blog post for the video will be updated to contain links to all blog posts that have the various demo code.
The following code is illustrates some simple examples of using Oracle R Enterprise. In these example you will see how to connect to the Oracle Database, how to query and process some of the tables and views in the Oracle Database, how to check that you are working with objects in the database, how to move data to the database and query it.
> library(ORE)
> # ore.connect(user="rquser", sid="orcl", host="localhost", password="rquser", port=1521, all=TRUE);
> ore.connect(user="dmuser", sid="orcl", host="localhost", password="dmuser", port=1521, all=FALSE);
> # Test the connection
> ore.is.connected()
[1] TRUE
> # List all the tables and views
> ore.ls()
character(0)
> # Use ore.sync to only include the tables and views listed
> ore.sync()
> ore.ls()
[1] "DEMO_R_APPLY_RESULT" "DEMO_R_TABLE" "INSUR_CUST_LTV_SAMPLE" "MINING_DATA_APPLY"
[5] "MINING_DATA_APPLY_V" "MINING_DATA_BUILD_V" "MINING_DATA_TEST_V" "MINING_DATA_TEXT_APPLY_V”
[9] "MINING_DATA_TEXT_BUILD_V" "MINING_DATA_TEXT_TEST_V"
> # Disconnect and reattached with no meta-data sync
> ore.disconnect()
> ore.connect(user="dmuser", sid="orcl", host="localhost", password="dmuser", port=1521, all=FALSE);
> ore.sync(table = c("MINING_DATA_BUILD_V", "MINING_DATA_TEST_V", "INSUR_CUST_LTV_SAMPLE"))
> ore.ls()
[1] "INSUR_CUST_LTV_SAMPLE" "MINING_DATA_BUILD_V" "MINING_DATA_TEST_V"
> # Check for the existance of a table or view
> ore.exists("MINING_DATA_BUILD_V")
[1] TRUE
> # list the objects in the DMUSER schema
> ore.ls("DMUSER")
[1] "INSUR_CUST_LTV_SAMPLE" "MINING_DATA_BUILD_V" "MINING_DATA_TEST_V"
> #
> # Load data from a file into a new table
> ore.exists("DEMO_R_TABLE")
[1] TRUE
> ore.drop(table='DEMO_R_TABLE')
> ore.ls()
[1] "INSUR_CUST_LTV_SAMPLE" "MINING_DATA_BUILD_V" "MINING_DATA_TEST_V"
> titanic ore.create(titanic, table="DEMO_R_TABLE")
> tData head(tData)
NAME PCLASS AGE SEX SURVIVED
1 Fynney, Mr Joseph J 2nd 35 male 0
2 Gale, Mr Harry 2nd 35 male 0
3 Gale, Mr Shadrach 2nd 38 male 0
4 Garside, Miss Ethel 2nd 24 female 1
5 Gaskell, Mr Alfred 2nd 16 male 0
6 Gavey, Mr Lawrence 2nd 26 male 0
> # Use ORE to pull data from the Database to local R
> # ore.pull -- United States of America
> mdbv mdbv_data head(mdbv_data,3)
CUST_ID CUST_GENDER AGE CUST_MARITAL_STATUS COUNTRY_NAME CUST_INCOME_LEVEL EDUCATION OCCUPATION
1 101501 F 41 NeverM United States of America J: 190,000 - 249,999 Masters Prof.
2 101502 M 27 NeverM United States of America I: 170,000 - 189,999 Bach. Sales
3 101503 F 20 NeverM United States of America H: 150,000 - 169,999 HS-grad Cleric.
HOUSEHOLD_SIZE YRS_RESIDENCE AFFINITY_CARD BULK_PACK_DISKETTES FLAT_PANEL_MONITOR HOME_THEATER_PACKAGE
1 2 4 0 1 1 1
2 2 3 0 1 1 0
3 2 2 0 1 0 0
BOOKKEEPING_APPLICATION PRINTER_SUPPLIES Y_BOX_GAMES OS_DOC_SET_KANJI
1 1 1 0 0
2 1 1 1 0
3 1 1 1 0
> class(mdbv_data)
[1] "data.frame”
> summary(mdbv_data)
ORE video : Demo Code Part 2
The following is the second set of demo code from my video on using R in the Oracle Database. Check out the video before using the following code. The blog post for the video will be updated to contain links to all blog posts that have the various demo code.
The following code gives a very quick demonstration of using the RORACLE R package to access the data in your Oracle schema. ROracle has a number of advantages over using RJDBC and most of the advantages are about the performance improvements. Typically when using ROracle you will see a many fold improvement with selecting data and moving it to your R client, processing data in the database and also writing data back to the Oracle Database. In some tests you can see a 7 times improvement in performance over RJDBC. Now that is a big difference.
But the problem with ROracle is that it is only available on certain platforms/OS. For example it is not officially available for the Mac. But if you google this issue carefully you will find unofficial ways over coming this problem.
ROracle is dependent on Oracle Client. So you will need to have Oracle Client installed on you machine and have it available on the search path.
When you have Oracle Client installed and the ROracle R package installed you are ready to start using it.
So here is the demo code from the video.
> library(ROracle) > drv # Create the connection string > host port sid connect.string "(CONNECT_DATA=(SID=", sid, ")))", sep = "") > con rs # fetch records from the resultSet into a data.frame > data # extract all rows > dim(data) [1] 6 1 > data VIEW_NAME 1 MINING_DATA_APPLY_V 2 MINING_DATA_BUILD_V 3 MINING_DATA_TEST_V 4 MINING_DATA_TEXT_APPLY_V 5 MINING_DATA_TEXT_BUILD_V 6 MINING_DATA_TEXT_TEST_V > dbCommit(con) > dbClearResult(rs) > dbDisconnect(con)
ORE video : Demo Code Part 2
The following is the second set of demo code from my video on using R in the Oracle Database. Check out the video before using the following code. The blog post for the video will be updated to contain links to all blog posts that have the various demo code.
The following code gives a very quick demonstration of using the RORACLE R package to access the data in your Oracle schema. ROracle has a number of advantages over using RJDBC and most of the advantages are about the performance improvements. Typically when using ROracle you will see a many fold improvement with selecting data and moving it to your R client, processing data in the database and also writing data back to the Oracle Database. In some tests you can see a 7 times improvement in performance over RJDBC. Now that is a big difference.
But the problem with ROracle is that it is only available on certain platforms/OS. For example it is not officially available for the Mac. But if you google this issue carefully you will find unofficial ways over coming this problem.
ROracle is dependent on Oracle Client. So you will need to have Oracle Client installed on you machine and have it available on the search path.
When you have Oracle Client installed and the ROracle R package installed you are ready to start using it.
So here is the demo code from the video.
> library(ROracle) > drv # Create the connection string > host port sid connect.string "(CONNECT_DATA=(SID=", sid, ")))", sep = "") > con rs # fetch records from the resultSet into a data.frame > data # extract all rows > dim(data) [1] 6 1 > data VIEW_NAME 1 MINING_DATA_APPLY_V 2 MINING_DATA_BUILD_V 3 MINING_DATA_TEST_V 4 MINING_DATA_TEXT_APPLY_V 5 MINING_DATA_TEXT_BUILD_V 6 MINING_DATA_TEXT_TEST_V > dbCommit(con) > dbClearResult(rs) > dbDisconnect(con)
- ← Previous
- 1
- …
- 4
- 5


