12.2
Auto enabling APPROX_* function in the Oracle Database
With the releases of 12.1 and 12.2 of Oracle Database we have seen some new functions that perform approximate calculations. These include:
- APPROX_COUNT_DISTINCT
- APPROX_COUNT_DISTINCT_DETAIL
- APPROX_COUNT_DISTINCT_AGG
- APPROX_MEDIAN
- APPROX_PERCENTILE
- APPROX_PERCENTILE_DETAIL
- APPROX_PERCENTILE_AGG
These functions can be used when approximate answers can be used instead of the exact answer. Yes can have many scenarios for these and particularly as we move into the big data world, the ability to process our data quickly is slightly more important and exact numbers. For example, is there really a difference between 40% of our customers being of type X versus 41%. The real answer to this is, ‘It Depends!’, but for a lot of analytical and advanced analytical methods this difference doesn’t really make a difference.
There are various reports of performance improvement of anything from 6x to 50x with the response times of the queries that are using these functions, instead of using the more traditional functions.
If you are a BI or big data analyst and you have build lots of code and queries using the more traditional functions. But what if you now want to use the newer functions. Does this mean you have go and modify all the code you have written over the years? you can imagine getting approval to do this!
The simple answer to this question is ‘No’. No you don’t have to change any code, but with some parameter changes for the DB or your session you can tell the database to automatically switch from using the traditional functions (count, etc) to the newer more optimised and significantly faster APPROX_* functions.
So how can you do this magic?
First let us see what the current settings values are:
SELECT name, value FROM v$ses_optimizer_env WHERE sid = sys_context('USERENV','SID') AND name like '%approx%';
Now let us run a query to test what happens using the default settings (on a table I have with 10,500 records).
set auto trace on select count(distinct cust_id) from test_inmemory; COUNT(DISTINCTCUST_ID) ---------------------- 1500 Execution Plan ---------------------------------------------------------- Plan hash value: 2131129625 -------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 13 | 70 (2)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 13 | | | | 2 | VIEW | VW_DAG_0 | 1500 | 19500 | 70 (2)| 00:00:01 | | 3 | HASH GROUP BY | | 1500 | 7500 | 70 (2)| 00:00:01 | | 4 | TABLE ACCESS FULL| TEST_INMEMORY | 10500 | 52500 | 69 (0)| 00:00:01 | --------------------------------------------------------------------------------------
Let us now set the automatic usage of the APPROX_* function.
alter session set approx_for_aggregation = TRUE; SQL> select count(distinct cust_id) from test_inmemory; COUNT(DISTINCTCUST_ID) ---------------------- 1495 Execution Plan ---------------------------------------------------------- Plan hash value: 1029766195 --------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 5 | 69 (0)| 00:00:01 | | 1 | SORT AGGREGATE APPROX| | 1 | 5 | | | | 2 | TABLE ACCESS FULL | TEST_INMEMORY | 10500 | 52500 | 69 (0)| 00:00:01 | ---------------------------------------------------------------------------------------
We can see above that the APPROX_* equivalent function was used, and slightly less work. But we only used this on a very small table.
The full list of session level settings is:
alter session set approx_for_aggregation = TRUE; alter session set approx_for_aggregation = FALSE; alter session set approx_for_count_distinct = TRUE; alter session set approx_for_count_distinct = FALSE; alter session set approx_for_percentile = 'PERCENTILE_CONT DETERMINISTIC'; alter session set approx_for_percentile = PERCENTILE_DISC; alter session set approx_for_percentile = NONE;
Or at a system wide level:
alter system set approx_for_aggregation = TRUE; alter system set approx_for_aggregation = FALSE; alter system set approx_for_count_distinct = TRUE; alter system set approx_for_count_distinct = FALSE; alter system set approx_for_percentile = 'PERCENTILE_CONT DETERMINISTIC'; alter system set approx_for_percentile = PERCENTILE_DISC; alter system set approx_for_percentile = NONE;
And to reset back to the default settings:
alter system reset approx_for_aggregation; alter system reset approx_for_count_distinct; alter system reset approx_for_percentile;
Blog posts on Oracle Advanced Analytics features in 12.2
A couple of days ago Oracle finally provided us with an on-premises download for Oracle 12.2 Database.
Go and download load it from here
or
Download the Database App Development VM with 12.2 (This is what I did)
Over the past couple of months I’ve been using the DBaaS of 12.2, trying out some of the new Advanced Analytics option new features, and other new features. Here are the links to the blog posts on these new 12.2 new features. There will be more coming over the next few months.
New OAA features in Oracle 12.2 Database
Explicit Semantic Analysis in Oracle 12.2c Database
Explicit Semantic Analysis setup using SQL and PL/SQL
and slightly related is the new SQL Developer 4.2
Explicit Semantic Analysis setup using SQL and PL/SQL
In my previous blog post I introduced the new Explicit Semantic Analysis (ESA) algorithm and gave an example of how you can build an ESA model and use it. Check out this link for that blog post.
In this blog post I will show you how you can manually create an ESA model. The reason that I’m showing you this way is that the workflow (in ODMr and it’s scheduler) may not be for everyone. You may want to automate the creation or recreation of the ESA model from time to time based on certain business requirements.
In my previous blog post I showed how you can setup a training data set. This comes with ODMr 4.2 but you may need to expand this data set or to use an alternative data set that is more in keeping with your domain.
Setup the ODM Settings table
As with all ODM algorithms we need to create a settings table. This settings table allows us to store the various parameters and their values, that will be used by the algorithm.
-- Create the settings table CREATE TABLE ESA_settings ( setting_name VARCHAR2(30), setting_value VARCHAR2(30)); -- Populate the settings table -- Specify ESA. By default, Naive Bayes is used for classification. -- Specify ADP. By default, ADP is not used. Need to turn this on. BEGIN INSERT INTO ESA_settings (setting_name, setting_value) VALUES (dbms_data_mining.algo_name, dbms_data_mining.algo_explicit_semantic_analys); INSERT INTO ESA_settings (setting_name, setting_value) VALUES (dbms_data_mining.prep_auto,dbms_data_mining.prep_auto_on); INSERT INTO ESA_settings (setting_name, setting_value) VALUES (odms_sampling,odms_sampling_disable); commit; END;
These are the minimum number of parameter setting needed to run the ESA algorithm. The other ESA algorithm setting include:
Setup the Oracle Text Policy
You also need to setup an Oracle Text Policy and a lexer for the Stopwords.
DECLARE v_policy_name varchar2(30); v_lexer_name varchar2(3) BEGIN v_policy_name := 'ESA_TEXT_POLICY'; v_lexer_name := 'ESA_LEXER'; ctx_ddl.create_preference(v_lexer_name, 'BASIC_LEXER'); v_stoplist_name := 'CTXSYS.DEFAULT_STOPLIST'; -- default stop list ctx_ddl.create_policy(policy_name => v_policy_name, lexer => v_lexer_name, stoplist => v_stoplist_name); END;
Create the ESA model
Once we have the settings table created with the parameter values set for the algorithm and the Oracle Text policy created, we can now create the model.
To ensure that the Oracle Text Policy is applied to the text we want to analyse we need to create a transformation list and add the Text Policy to it.
We can then pass the text transformation list as a parameter to the CREATE_MODEL, procedure.
DECLARE v_xlst dbms_data_mining_transform.TRANSFORM_LIST; v_policy_name VARCHAR2(130) := 'ESA_TEXT_POLICY'; v_model_name varchar2(50) := 'ESA_MODEL_DEMO_2'; BEGIN v_xlst := dbms_data_mining_transform.TRANSFORM_LIST(); DBMS_DATA_MINING_TRANSFORM.SET_TRANSFORM(v_xlst, '"TEXT"', NULL, '"TEXT"', '"TEXT"', 'TEXT(POLICY_NAME:'||v_policy_name||')(MAX_FEATURES:3000)(MIN_DOCUMENTS:1)(TOKEN_TYPE:NORMAL)'); DBMS_DATA_MINING.DROP_MODEL(v_model_name, TRUE); DBMS_DATA_MINING.CREATE_MODEL( model_name => v_model_name, mining_function => DBMS_DATA_MINING.FEATURE_EXTRACTION, data_table_name => 'WIKISAMPLE', case_id_column_name => 'TITLE', target_column_name => NULL, settings_table_name => 'ESA_SETTINGS', xform_list => v_xlst); END;
NOTE: Yes we could have merged all of the above code into one PL/SQL block.
Use the ESA model
We can now use the FEATURE_COMPARE function to use the model we just created, just like I did in my previous blog post.
SELECT FEATURE_COMPARE(ESA_MODEL_DEMO_2 USING 'Oracle Database is the best available for managing your data' text AND USING 'The SQL language is the one language that all databases have in common' text) similarity FROM DUAL;
Go give the ESA algorithm a go and see where you could apply it within your applications.
Oracle Data Miner 4.2 New Features
Oracle Data Miner 4.2 (part of SQL Developer 4.2) got released as an Early Adopter versions (EA) a few weeks ago.
With the new/updated Oracle Data Miner (ODMr) there are a number of new features. These can be categories as 1) features all ODMr users can use now, 2) New features that are only usable when using Oracle 12.2 Database, and 3) Updates to existing algorithms that have been exposed via the ODMr tool.
The following is a round up of the main new features you can enjoy as part of ODMr 4.2 (mainly covering points 1 and 2 above)
- You can now schedule workflows to run based on a defined schedule
- Support for additional data types (RAW, ROWID, UROWID, URITYPE)
- Better support for processing JSON data in the JSON Query node
- Additional insights are displayed as part of the Model Details View
- Additional alert monitoring and reporting
- Better support for processing in-memory data
- A new R Model node that allows you to include in-database ORE user defined R function to support model build, model testing and applying of new model.
- New Explicit Semantic Analysis node (Explicit Feature Extraction)
- New Feature Compare and Test nodes
- New workflow status profiling perfoance improvements
- Refresh the input data definition in nodes
- Attribute Filter node now allows for unsupervised attribute importance ranking
- The ability to build Partitioned data mining models
Look out for the blog posts on most of these new features over the coming months.
WARNING: Most of these new features requires an Oracle 12.2 Database.
12.2 DBaaS (Extreme Edition) possible bug/issue with the DB install/setup
A few weeks ago the 12.2 Oracle Database was released on the cloud. I immediately set an account and got my 12.2 DBaaS setup. This was a relatively painless process and quick.
For me I wanted to test out all the new Oracle Advanced Analytics new features and the new features in SQL Developer 4.2 that only become visible when you are using the 12.2 Database.
When you are go to use the Oracle Data Miner (GUI tool) in SQL Developer 4.2, it will check to see if the ODMr repository is installed in the database. If it isn’t then you will be promoted for the SYS password.
This is the problem. In previous version of the DBaaS (12.1, etc) this was not an issue.
When you go to create your DBaaS you are asked for a password that will be used for the admin accounts of the database.
But when I entered the password for SYS, I got an error saying invalid password.
After using ssh to create a terminal connection to the DBaaS I was able to to connect to the container using
sqlplus / as sysdba
and also using
sqlplus sys/ as sysdba
Those worked fine. But when I tried to connect to the PDB1 I got the invalid username and/or password error.
sqlplus sys/@pdb1 as sysdba
I reconnected as follows
sqlplus / as sysdba
and then changed the password for SYS with containers=all
This command completed without errors but when I tried using the new password to connect the the PDB1 I got the same error.
After 2 weeks working with Oracle Support they eventually pointed me to the issue of the password file for the PDB1 was missing. They claim this is due to an error when I was creating/installing the database.
But this was a DBaaS and I didn’t install the database. This is a problem with how Oracle have configured the installation.
The answer was to create a password file for the PDB1 using the following