SQL
Storing and processing Unicode characters in Oracle
Unicode is a computing industry standard for the consistent encoding, representation, and handling of text expressed in most of the world’s writing systems (Wikipedia). The standard is maintained by the Unicode Consortium, and contains over 137,994 characters (137,766 graphic characters, 163 format characters and 65 control characters).
The NVARCHAR2 is Unicode data type that can store Unicode characters in an Oracle Database. The character set of the NVARCHAR2 is national character set specified at the database creation time. Use the following to determine the national character set for your database.
SELECT * FROM nls_database_parameters WHERE PARAMETER = 'NLS_NCHAR_CHARACTERSET';
For my database I’m using an Oracle Autonomous Database. This query returns the character set AL16UTF16. This character set encodes Unicode data in the UTF-16 encoding and uses 2 bytes to store a character.
When creating an attribute with this data type, the size value (max 4000) determines the number of characters allowed. The actual size of the attribute will be double.
Let’s setup some data to test this data type.
CREATE TABLE demo_nvarchar2 ( attribute_name NVARCHAR2(100)); INSERT INTO demo_nvarchar2 VALUES ('This is a test for nvarchar2');
The string is 28 characters long. We can use the DUMP function to see the details of what is actually stored.
SELECT attribute_name, DUMP(attribute_name,1016) FROM demo_nvarchar2;
The DUMP function returns a VARCHAR2 value that contains the datatype code, the length in bytes, and the internal representation of a value.
You can see the difference in the storage size of the NVARCHAR2 and the VARCHAR2 attributes.
Valid values for the return_format are 8, 10, 16, 17, 1008, 1010, 1016 and 1017. These values are assigned the following meanings:
8 – octal notation
10 – decimal notation
16 – hexadecimal notation
17 – single characters
1008 – octal notation with the character set name
1010 – decimal notation with the character set name
1016 – hexadecimal notation with the character set name
1017 – single characters with the character set name
The returned value from the DUMP function gives the internal data type representation. The following table lists the various codes and their description.
Code | Data Type |
---|---|
1 | VARCHAR2(size [BYTE | CHAR]) |
1 | NVARCHAR2(size) |
2 | NUMBER[(precision [, scale]]) |
8 | LONG |
12 | DATE |
21 | BINARY_FLOAT |
22 | BINARY_DOUBLE |
23 | RAW(size) |
24 | LONG RAW |
69 | ROWID |
96 | CHAR [(size [BYTE | CHAR])] |
96 | NCHAR[(size)] |
112 | CLOB |
112 | NCLOB |
113 | BLOB |
114 | BFILE |
180 | TIMESTAMP [(fractional_seconds)] |
181 | TIMESTAMP [(fractional_seconds)] WITH TIME ZONE |
182 | INTERVAL YEAR [(year_precision)] TO MONTH |
183 | INTERVAL DAY [(day_precision)] TO SECOND[(fractional_seconds)] |
208 | UROWID [(size)] |
231 | TIMESTAMP [(fractional_seconds)] WITH LOCAL TIMEZONE |
Reading Data from Oracle Table into Python Pandas – How long & Different arraysize
Here are some results from a little testing I recent did on extracting data from an Oracle database and what effect the arraysize makes and which method might be the quickest.
The arraysize determines how many records will be retrieved in each each batch. When a query is issued to the database, the results are returned to the calling programme in batches of a certain size. Depending on the nature of the application and the number of records being retrieved, will determine the arraysize value. The value of this can have a dramatic effect on your query and application response times. Sometimes a small value works very well but sometimes you might need a larger value.
My test involved using an Oracle Database Cloud instance, using Python and the following values for the arraysize.
arraysize = (5, 50, 500, 1000, 2000, 3000, 4000, 5000)
The first test was to see what effect these arraysizes have on retrieving all the data from a table. The in question has 73,668 records. So not a large table. The test loops through this list of values and fetches all the data, using the fetchall function (part of cx_Oracle), and then displays the time taken to retrieve the results.
# import the Oracle Python library import cx_Oracle import datetime import pandas as pd import numpy as np # setting display width for outputs in PyCharm desired_width = 280 pd.set_option('display.width', desired_width) np.set_printoptions(linewidth=desired_width) pd.set_option('display.max_columns',30) # define the login details p_username = "************" p_password = "************" p_host = "************" p_service = "************" p_port = "1521" print('--------------------------------------------------------------------------') print(' Testing the time to extract data from an Oracle Database.') print(' using different approaches.') print('---') # create the connection con = cx_Oracle.connect(user=p_username, password=p_password, dsn=p_host+"/"+p_service+":"+p_port) print('') print(' Test 1: Extracting data using Cursor for different Array sizes') print(' Array Size = 5, 50, 500, 1000, 2000, 3000, 4000, 5000') print('') print(' Starting test at : ', datetime.datetime.now()) beginTime = datetime.datetime.now() cur_array_size = (5, 50, 500, 1000, 2000, 3000, 4000, 5000) sql = 'select * from banking_marketing_data_balance_v' for size in cur_array_size: startTime = datetime.datetime.now() cur = con.cursor() cur.arraysize = size results = cur.execute(sql).fetchall() print(' Time taken : array size = ', size, ' = ', datetime.datetime.now()-startTime, ' seconds, num of records = ', len(results)) cur.close() print('') print(' Test 1: Time take = ', datetime.datetime.now()-beginTime) print('')
And here are the results from this first test.
Starting test at : 2018-11-14 15:51:15.530002 Time taken : array size = 5 = 0:36:31.855690 seconds, num of records = 73668 Time taken : array size = 50 = 0:05:32.444967 seconds, num of records = 73668 Time taken : array size = 500 = 0:00:40.757931 seconds, num of records = 73668 Time taken : array size = 1000 = 0:00:14.306910 seconds, num of records = 73668 Time taken : array size = 2000 = 0:00:10.182356 seconds, num of records = 73668 Time taken : array size = 3000 = 0:00:20.894687 seconds, num of records = 73668 Time taken : array size = 4000 = 0:00:07.843796 seconds, num of records = 73668 Time taken : array size = 5000 = 0:00:06.242697 seconds, num of records = 73668
As you can see the variation in the results.
You may get different performance results based on your location, network connectivity and proximity of the database. I was at home (Ireland) using wifi and my database was located somewhere in USA. I ran the rest a number of times and the timings varied by +/- 15%, which is a lot!
When the data is retrieved in this manner you can process the data set in the returned results set. Or what is more traditional you will want to work with the data set as a panda. The next two test look at a couple of methods of querying the data and storing the result sets in a panda.
For these two test, I’ll set the arraysize = 3000. Let’s see what happens.
For the second test I’ll again use the fetchall() function to retrieve the data set. From that I extract the names of the columns and then create a panda combining the results data set and the column names.
startTime = datetime.datetime.now() print(' Starting test at : ', startTime) cur = con.cursor() cur.arraysize = cur_array_size results = cur.execute(sql).fetchall() print(' Fetched ', len(results), ' in ', datetime.datetime.now()-startTime, ' seconds at ', datetime.datetime.now()) startTime2 = datetime.datetime.now() col_names = [] for i in range(0, len(cur.description)): col_names.append(cur.description[i][0])
print(' Fetched data & Created the list of Column names in ', datetime.datetime.now()-startTime, ' seconds at ', datetime.datetime.now())
The results from this are.
Fetched 73668 in 0:00:07.778850 seconds at 2018-11-14 16:35:07.840910 Fetched data & Created the list of Column names in 0:00:07.779043 seconds at 2018-11-14 16:35:07.841093 Finished creating Dataframe in 0:00:07.975074 seconds at 2018-11-14 16:35:08.037134 Test 2: Total Time take = 0:00:07.975614
Now that was quick. Fetching the data set in just over 7.7788 seconds. Creating the column names as fractions of a millisecond, and then the final creation of the panda took approx 0.13 seconds.
For the third these I used the pandas library function called read_sql(). This function takes two inputs. The first is the query to be processed and the second the name of the database connection.
print(' Test 3: Test timing for read_sql into a dataframe') cur_array_size = 3000 print(' will use arraysize = ', cur_array_size) print('') startTime = datetime.datetime.now() print(' Starting test at : ', startTime) df2 = pd.read_sql(sql, con) print(' Finished creating Dataframe in ', datetime.datetime.now()-startTime, ' seconds at ', datetime.datetime.now()) # close the connection at end of experiments con.close()
and the results from this are.
Test 3: Test timing for read_sql into a dataframe will use arraysize = 3000 Starting test at : 2018-11-14 16:35:08.095189 Finished creating Dataframe in 0:02:03.200411 seconds at 2018-11-14 16:37:11.295611
You can see that it took just over 2 minutes to create the panda data frame using the read_sql() function, compared to just under 8 seconds using the previous method.
It is important to test the various options for processing your data and find the one that works best in your environment. As with most languages there can be many ways to do the same thing. The challenge is to work out which one you should use.
OUG Ireland 2017 Presentation
Here are the slides from my presentation at OUG Ireland 2017. All about running R using SQL.
Formatting results from ORE script in a SELECT statement
This blog post looks at how to format the output or the returned returns from an Oracle R Enterprise (ORE), user defined R function, that is run using a SELECT statement in SQL.
Sometimes this can be a bit of a challenge to work out, but it can be relatively easy once you have figured out how to do it. The following examples works through some scenarios of different results sets from a user defined R function that is stored in the Oracle Database.
To run that user defined R function using a SELECT statement I can use one of the following ORE SQL functions.
- rqEval
- rqTableEval
- “rqGroupEval“
- rqRowEval
For simplicity we will just use the first of these ORE SQL functions to illustrate the problem and how to go about solving it. The rqEval
ORE SQL function is a generate purpose function to call a user defined R script stored in the database. The function does not require any input data set and but it will return some data. You could use this to generate some dummy/test data or to find some information in the database. Here is noddy example that returns my name.
BEGIN --sys.rqScriptDrop('GET_NAME'); sys.rqScriptCreate('GET_NAME', 'function() { res<-data.frame("Brendan") res } '); END;
To call this user defined R function I can use the following SQL.
select * from table(rqEval(null, 'select cast(''a'' as varchar2(50)) from dual', 'GET_NAME') );
For text strings returned you need to cast the returned value giving a size.
If we have a numeric value being returned we can don’t have to use the cast and instead use ‘1’ as shown in the following example. This second example extends our user defined R function to return my name and a number.
BEGIN sys.rqScriptDrop('GET_NAME'); sys.rqScriptCreate('GET_NAME', 'function() { res<-data.frame(NAME="Brendan", YEAR=2017) res } '); END;
To call the updated GET_NAME function we now have to process two returned columns. The first is the character string and the second is a numeric.
select * from table(rqEval(null, 'select cast(''a'' as varchar2(50)) as "NAME", 1 AS YEAR from dual', 'GET_NAME') );
These example illustrate how you can process character strings and numerics being returned by the user defined R script.
The key to setting up the format of the returned values is knowing the structure of the data frame being returned by the user defined R script. Once you know that the rest is (in theory) easy.
Explicit Semantic Analysis setup using SQL and PL/SQL
In my previous blog post I introduced the new Explicit Semantic Analysis (ESA) algorithm and gave an example of how you can build an ESA model and use it. Check out this link for that blog post.
In this blog post I will show you how you can manually create an ESA model. The reason that I’m showing you this way is that the workflow (in ODMr and it’s scheduler) may not be for everyone. You may want to automate the creation or recreation of the ESA model from time to time based on certain business requirements.
In my previous blog post I showed how you can setup a training data set. This comes with ODMr 4.2 but you may need to expand this data set or to use an alternative data set that is more in keeping with your domain.
Setup the ODM Settings table
As with all ODM algorithms we need to create a settings table. This settings table allows us to store the various parameters and their values, that will be used by the algorithm.
-- Create the settings table CREATE TABLE ESA_settings ( setting_name VARCHAR2(30), setting_value VARCHAR2(30)); -- Populate the settings table -- Specify ESA. By default, Naive Bayes is used for classification. -- Specify ADP. By default, ADP is not used. Need to turn this on. BEGIN INSERT INTO ESA_settings (setting_name, setting_value) VALUES (dbms_data_mining.algo_name, dbms_data_mining.algo_explicit_semantic_analys); INSERT INTO ESA_settings (setting_name, setting_value) VALUES (dbms_data_mining.prep_auto,dbms_data_mining.prep_auto_on); INSERT INTO ESA_settings (setting_name, setting_value) VALUES (odms_sampling,odms_sampling_disable); commit; END;
These are the minimum number of parameter setting needed to run the ESA algorithm. The other ESA algorithm setting include:
Setup the Oracle Text Policy
You also need to setup an Oracle Text Policy and a lexer for the Stopwords.
DECLARE v_policy_name varchar2(30); v_lexer_name varchar2(3) BEGIN v_policy_name := 'ESA_TEXT_POLICY'; v_lexer_name := 'ESA_LEXER'; ctx_ddl.create_preference(v_lexer_name, 'BASIC_LEXER'); v_stoplist_name := 'CTXSYS.DEFAULT_STOPLIST'; -- default stop list ctx_ddl.create_policy(policy_name => v_policy_name, lexer => v_lexer_name, stoplist => v_stoplist_name); END;
Create the ESA model
Once we have the settings table created with the parameter values set for the algorithm and the Oracle Text policy created, we can now create the model.
To ensure that the Oracle Text Policy is applied to the text we want to analyse we need to create a transformation list and add the Text Policy to it.
We can then pass the text transformation list as a parameter to the CREATE_MODEL, procedure.
DECLARE v_xlst dbms_data_mining_transform.TRANSFORM_LIST; v_policy_name VARCHAR2(130) := 'ESA_TEXT_POLICY'; v_model_name varchar2(50) := 'ESA_MODEL_DEMO_2'; BEGIN v_xlst := dbms_data_mining_transform.TRANSFORM_LIST(); DBMS_DATA_MINING_TRANSFORM.SET_TRANSFORM(v_xlst, '"TEXT"', NULL, '"TEXT"', '"TEXT"', 'TEXT(POLICY_NAME:'||v_policy_name||')(MAX_FEATURES:3000)(MIN_DOCUMENTS:1)(TOKEN_TYPE:NORMAL)'); DBMS_DATA_MINING.DROP_MODEL(v_model_name, TRUE); DBMS_DATA_MINING.CREATE_MODEL( model_name => v_model_name, mining_function => DBMS_DATA_MINING.FEATURE_EXTRACTION, data_table_name => 'WIKISAMPLE', case_id_column_name => 'TITLE', target_column_name => NULL, settings_table_name => 'ESA_SETTINGS', xform_list => v_xlst); END;
NOTE: Yes we could have merged all of the above code into one PL/SQL block.
Use the ESA model
We can now use the FEATURE_COMPARE function to use the model we just created, just like I did in my previous blog post.
SELECT FEATURE_COMPARE(ESA_MODEL_DEMO_2 USING 'Oracle Database is the best available for managing your data' text AND USING 'The SQL language is the one language that all databases have in common' text) similarity FROM DUAL;
Go give the ESA algorithm a go and see where you could apply it within your applications.
Auditing Oracle Data Mining model usage
In a previous blog post I talked about how you can rename and comment your Oracle Data Mining models. This is to allow you to easily to see and understand the intended use of the data mining model.
Another feature available to you is to audit the usage of the the data mining models. As your data mining environment grows to many 10s or more typically 100s of models, you will need to have some way of tracking their usage. This can allow you to discover what models are frequently being used and those that are not being used in-frequently. You can then use this information to investigate if there are any issues. Or in some companies I’ve seen an internal charging scheme in place for each time the models are used.
The following outlines the steps required to setup the auditing of your models and how to inspect the usage.
Note: You will need to the AUDIT_ADMIN role to audit the models.
First create an audit policy for the data mining model in a particular schema.
CREATE AUDIT POLICY oaa_odm_audit_usage ACTIONS ALL ON MINING MODEL dmuser.high_value_churn_clas_svm;
This creates a policy that monitors all activity on the data mining model HIGH_VALUE_CHURN_CLAS_SVM in the DMUSER schema.
Now we need to enable the policy and allow to to tract all activity on the model.
AUDIT POLICY oaa_odm_audit_usage BY oaa_model_user;
This will track all usage of the data mining model by the schema call OAA_MODEL_USER. We can then use the following query to search for the audit records for the OAA_MODEL_USER schema.
SELECT dbusername, action_name, systemm_privilege_used, return_code, object_schema, object_name, sql_text FROM unified_audit_trail WHERE object_name = 'HIGH_VALUE_CHURN_CLAS_SVM';
But there is a little problem with using what I’ve just shown you above. The problem is that it will track all activity on the data mining model. Perhaps this isn’t what we really want. Perhaps we only want to track only certain activity of the data mining model. Instead of creating the policy using ‘ACTIONS ALL’, we can list out the actions or operations we want to track. For example, we want to tract when it is used in a SELECT. The following shows how you can set this up for just SELECT.
CREATE AUDIT POLICY oaa_odm_audit_select ACTIONS SELECT ON MINING MODEL dmuser.high_value_churn_clas_svm; AUDIT POLICY oaa_odm_audit_select BY oaa_model_user;
The list of individual audit events you can use include:
- AUDIT
- COMMENT
- GRANT
- RENAME
- SELECT
A policy can be setup to tract one or more of these events. For example, if we wanted a policy to track SELECT and GRANT, we would have list each event separated by a comma.
CREATE AUDIT POLICY oaa_odm_audit_select_grant ACTIONS SELECT ON MINING MODEL dmuser.high_value_churn_clas_svm, ACTIONS GRANT ON MINING MODEL dmuser.high_value_churn_clas_svm, ; AUDIT POLICY oaa_odm_audit_select_grant BY oaa_model_user;
Renaming & Commenting Oracle Data Mining Models
As your company evolves with their data mining projects, the number of models produced and in use in production will increase dramatically.
Care needs to be taken when it comes to managing these. This includes using meaningful names, adding descriptions of what the model is about or for, and being able to track their usage, etc.
I will look at tracking the usage of the models in another blog post, but the following gives examples of how to rename Oracle Data Mining models and how to add comments or descriptions to these models. This is particularly useful because our data analytics teams have a constant turn over or it has been many months since you last worked on a model and you want a quick idea of what purpose of the model was for.
If you have been using the Oracle Data Mining tool (part of SQL Developer) will will see your model being created with some sort of sequencing numbers. For example for a Support Vector Machine (SVM) model you might see it labelled for classification:
CLAS_SVM_5_22
While you are working on this project you will know and understand what it was about and why it is being used. But afterward you may forget as you will be dealing with many hundreds of models. Yes you could check your documentation for the purpose of this model but that can take some time.
What if you could run a SQL query to find out?
But first we need to rename the model.
DBMS_DATA_MINING.RENAME_MODEL('CLAS_SVM_5_22', 'HIGH_VALUE_CHURN_CLAS_SVM');
Next we will want to add a longer description of what the model is about. We can do this by adding a comment to the model.
COMMENT ON MINING MODEL high_value_churn_clas_svm IS 'Classification Model to Predict High Value Customers most likely to Churn';
We can now see these updated details when we query the Oracle Data Mining models in a user schema.
SELECT model_name, mining_function, algorithm, comments FROM user_mining_models;
These are two very useful commands.
Evaluating Cluster Dispersion in Oracle Data Mining
When working with the Clustering algorithms, and particularly k-Means, in the Oracle Data Miner tool there is no way of seeing how compact or dispersed the data is within a cluster.
There are a number of measures typically used in various tools and algorithms, but with Oracle Data Miner we are not presented with any of this information.
But if we flip from using the Oracle Data Miner tool to using SQL we can get to see some more details of the clusters produced by the k-Means algorithm along with some additional and useful information.
As I said there are a number of different measures used to evaluate clusters. The one that Oracle uses is called Dispersion. Now there are a few different definitions of what this could be and I haven’t been able to locate what is Oracle’s own definition of it in any of the documentation.
We can use the Dispersion value as a measure of how compact or how spread out the data is within a cluster. The Dispersion value is a number greater than 0. The lower the value of the more compact the cluster is i.e. the data points are close the the centroid of the cluster. The larger the value the more disperse or spread out the data points are.
The DBMS_DATA_MINING PL/SQL package comes with a function called GET_MODEL_DETAILS_KM. This function returns a record of the form DM_CLUSTERS.
(id NUMBER, cluster_id VARCHAR2(4000), record_count NUMBER, parent NUMBER, tree_level NUMBER, dispersion NUMBER, split_predicate DM_PREDICATES, child DM_CHILDREN, centroid DM_CENTROIDS, histogram DM_HISTOGRAMS, rule DM_RULE)
We can not use the following query to get the Dispersion value for each of the clusters from an ODM cluster model.
SELECT cluster_id, record_count, parent, tree_level, dispersion FROM table(dbms_data_mining.get_model_details_km('CLUS_KM_3_2'));
Using the Identity column for Oracle Data Miner
If you are a user of the Oracle Data Miner tool (the workflow data mining tool that is part of SQL Developer), then you will have noticed that for many of the algorithms you can specify a Case Id attribute along with, say, the target attribute.
The idea is that you have one attribute that is a unique identifier for each case record. This may or may not be the case in your data model and you may have a multiple attribute primary key or case record identifier.
But what is the Case Id field used for in Oracle Data Miner?
Based on the documentation this field does not need to have a value. But it is recommended that you do identify an attribute for the Case Id, as this will allow for reproducible results. What this means is that if we run our workflow today and again in a few days time, on the exact same data, we should get the same results. So the Case Id allows this to happen. But how? Well it looks like the attribute used or specified for the Case Id is used as part of the Hashing algorithm to partition the data into a train and test data set, for classification problems.
So if you don’t have a single attribute case identifier in your data set, then you need to create one. There are a few options open to you to do this.
- Create one: write some code that will generate a unique identifier for each of your case records based on some defined rule.
- Use a sequence: and update the records to use this sequence.
- Use ROWID: use the unique row identifier value. You can write some code to populate this value into an attribute. Or create a view on the table containing the case records and add a new attribute that will use the ROWID. But if you move the data, then the next time you use the view then you will be getting different ROWIDs and that in turn will mean we may have different case records going into our test and training data sets. So our workflows will generate different results. Not what we want.
- Use ROWNUM: This is kind of like using the ROWID. Again we can have a view that will select ROWNUM for each record. Again we may have the same issues but if we have our data ordered in a way that ensures we get the records returned in the same order then this approach is OK to use.
- Use Identity Column: In Oracle 12c we have a new feature called Identify Column. This kind of acts like a sequence but we can defined an attribute in a table to be an Identity Column, and as records are inserted into the the data (in our scenario our case table) then this column will automatically generate a unique number for our data. Again if we need to repopulate the case table, you will need to drop and recreate the table to get the Identity Column to reset, otherwise the newly inserted records will start with the next number of the Identity Column
Here is an example of using the Identity Column in a case table.
CREATE TABLE case_table ( id_column NUMBER GENERATED ALWAYS AS IDENTITY, affinity_card NUMBER, age NUMBER, cust_gender VARCHAR2(5), country_name VARCHAR2(20) ... );
You can now use this Identity Column as the Case Id in your Oracle Data Miner workflows.
My 2nd Book: is now available: Real World SQL and PL/SQL
It has been a busy 12 month. In addition to the day jobs, I’ve also been busy writing. (More news on this in a couple of weeks!)
Today is a major milestone as my second book is officially released and available in print and ebook formats.
The tile of the book is ‘Real Word SQL and PL/SQL: Advice from the Experts’. Check it out on Amazon.
Now that sounds like a very fancy title, but it isn’t meant to be. This book is written by 5 people (including me), who are all Oracle ACE Directors, who all have 20+ years of experience, each, of working with the Oracle Database, and we all love sharing our knowledge. My co-authors are Arup Nanda, Heli Helskyaho, Martin Widlake and Alex Nuitjen. It was a pleasure working with you.
I haven’t seen a physical copy of the book yet !!! Yes the book is released and I haven’t held it in my hands. Although I have seen pictures of it that other people have taken. There was a delay in sending out the author copies of the book, but as of this morning my books are sitting in Stansted Airport and should be making their way to Ireland today. So fingers crossed I’ll have them tomorrow. I’ll update this blog post with a picture when I have them. UPDATE: They finally arrived at 13:25 on the 22nd August.
In addition to the 5 authors we also had Chet Justice (Oraclenerd), and Oracle ACE Director, as the technical editor. We also had Tim Hall, Oracle ACE Director, wrote a foreword for us.
To give you some background to the book and why we wrote it, here is an extract from the start of the book, where I describe how the idea for this book came about and the aim of the book.
“While attempting to give you an idea into our original thinking behind the need for this book and why we wanted to write it, . the words of Rod Stewart’s song ‘Sailing’ keeps popping into my mind. These are ‘We are sailing, we are sailing, home again ‘cross the sea’. This is because the idea for this book was born on a boat. Some call it a ship. Some call it a cruise ship. Whatever you want to call it, this book was born at the OUG Norway conference in March 2015. What makes the OUG Norway conference special is that it is held on a cruise ship that goes between Oslo in Norway to Kiel in Germany and back again. This means as a speaker and conference attendee you are ‘trapped’ on the cruise ship for 2 days filled with presentations, workshops, discussions and idea sharing for the Oracle community.
It was during this conference that Heli and Brendan got talking about their books. Heli had just published her Oracle SQL Developer Data Modeler book and Brendan had published his book on Oracle Data Miner the previous year. Whilst they were discussing their experiences of writing and sharing their knowledge and how much they enjoyed this,they both recognized that there are a lot of books for the people starting out in their Oracle career and then there are lots of books on specialized topics. What was missing were books that covered the middle group. A question they kept on asking but struggled to answer was, ‘after reading the introductory books, what book would they read next before getting onto the specialized books?’ This was particularly true of SQL and PL/SQL.
They also felt that something that was missing from many books, especially introductory ones, was the “Why and How” of doing things in certain ways that comes from experience. It is all well and good knowing the syntax of commands and the options, but what takes people from understanding a language to being productive in using it is that real-world derived knowledge that comes from using it for real tasks. It would be great to share some of that experience.
Then over breakfast on the final day of the OUG Norway conference, as the cruise ship was sailing through the fjorrd and around the islands that lead back to Oslo, Heli and Brendan finally agreed that this book should happen. They then listed the type of content they thought would be in such a book and who are the recognized experts (or super heroes) for these topics. This list of experts was very easy to come up with and the writing team of Oracle ACE Directors was formed, consisting of Arup Nanda, Martin Widlake and Alex Nuijten, along with Heli Helskyaho and Brendan Tierney. The author team then got to work defining the chapters and their contents. Using their combined 120+ years of SQL and PL/SQL experience they finally came up with scope and content for the book at Oracle Open World.
…”
As you can see, this book was 17 months in the making. This consisted of 4 months of proposal writing, research and refinement, 8 months of writing, 3 months of editing and 2 months for production of book.
Yes it takes a lot of time and commitment. We all finished our last tasks and final edits on the book back in early June. Since then the book has been sent for printing, converted into an ebook, books shipped to Oracle Press warehouse, then shipped to Amazon and other book sellers. Today it is finally available officially.
(when I say officially, it seems that Amazon has shipped some pre-ordered books a week ago)
If you are at Oracle Open World (OOW) in September make sure to check out the book in the Oracle Book Store, and if you buy a copy try to track us down to get us to sign it. The best way to do this is to contact us on Twitter, leave a message at the Oracle Press stand, or you will find us hanging out at the OTN Lounge.
Change the size of ORE PNG graphics using in-database R functions
In a previous blog post I showed you how create and display a ggplot2 R graphic using SQL. Make sure to check it out before reading the rest of this blog post.
In my previous blog post, I showed and mentioned that the PNG graphic returned by the embedded R execution SQL statement was not the same as what was produced if you created the graphic in an R session.
Here is the same ggplot2 graphic. The first one is what is produced in an R session and the section is what is produced by SQL query and the embedded R execution in Oracle.
As you can see the second image (produced using the embedded R execution) gives a very square image.
The reason for this is that Oracle R Enterprise (ORE) creates the graphic image in PNG format. The default setting from this is 480 x 480. You will find this information when you go digging in the R documentation and not in the Oracle documentation.
So, how can I get my ORE produced graphic to appear like what is produced in R?
What you need to do is to change the height and width of the PNG image produced by ORE. You can do this by passing parameters in the SQL statement used to call the user defined R function, that in turn produces the ggplot2 image.
In my previous post, I gave the SQL statement to call and produce the graphic (shown above). One of the parameters to the rqTableEval function was set to null. This was because we didn’t have any parameters to pass, apart from the data set.
We can replace this null with any parameters we want to pass to the user defined R function (demo_ggpplot). To pass the parameters we need to define them using a SELECT statement.
cursor(select 500 as "ore.png.height", 850 as "ore.png.width" from dual),
The full SELECT statement now becomes
select * from table(rqTableEval( cursor(select * from claims), cursor(select 500 as "ore.png.height", 850 as "ore.png.width" from dual), 'PNG', 'demo_ggpplot'));
When you view the graphic in SQL Developer, you will get something that looks a bit more like what you would expect or want to see.
For each graphic image you want to produce using ORE you will need to figure out that are the best PNG height and width settings to use. Plus it also depends on what tool or application you are going to use to display the images (eg. APEX etc)
Checking out the Oracle Reserved Words using V$RESERVED_WORDS
When working with SQL or PL/SQL we all know there are some words we cannot use in our code or to label various parts of it. These languages have a number of reserved words that form the language.
Somethings it can be a challenge to know what is or isn’t a reserved word. Yes we can check the Oracle documentation for the SQL reserved words and the PL/SQL reserved words. There are other references and list in the Oracle documentation listing the reserved and key words.
But we also have the concept of Key Words (as opposed to reserved words). In the SQL documentation these are are not listed. In the PL/SQL documentation most are listed.
What is a Key Word in Oracle ?
Oracle SQL keywords are not reserved. BUT Oracle uses them internally in specific ways. If you use these words as names for objects and object parts, then your SQL statements may be more difficult to read and may lead to unpredictable results.
But if we didn’t have access to the documentation (or google) how can we find out what the key words are. You can use the data dictionary view called V$RESERVED_WORDS.
But this view isn’t available to version. So if you want to get your hands on it you will need the SYS user. Alternatively if you are a DBA you could share this with all your developers.
When we query this view we get 2,175 entries (for 12.1.0.2 Oracle Database).
- ← Previous
- 1
- 2
- 3
- Next →
You must be logged in to post a comment.