oracle big data
The Oracle 11.2 database contains 3 PL/SQL packages that allow you to perform all (well almost all) of your data mining functions.
So instead of using the Oracle Data Miner tool you can write some PL/SQL code that will you to do the same things.
Before you can start using these PL/SQL packages you need to ensure that the schema that you are going to use has been setup with the following:
- Create a schema or use and existing one
- Grant the schema all the data mining privileges: see my earlier posting on how to setup an Oracle schema for data mining – Click here and YouTube video
- Grant all necessary privileges to the data that you will be using for data mining
The first PL/SQL package that you will use is the DBMS_DATA_MINING_TRANSFORM. This PL/SQL package allows you to transform the data to make it suitable for data mining. There are a number of functions in this package that allows you to transform the data, but depending on the data you may need to write your own code to perform the transformations. When you apply your data model to the test or the apply data sets, ODM will automatically take the transformation functions defined using this package and apply them to the new data sets.
The second PL/SQL package is DBMS_DATA_MINING. This is the main data mining PL/SQL package. It contains functions to allow you to:
- To create a Model
- Describe the Model
- Exploring and importing of Models
- Computing costs and text metrics for classification Models
- Applying the Model to new data
- Administration of Models, like dropping, renaming, etc
The next (and last) PL/SQL package is DBMS_PREDICTIVE_ANALYTICS.The routines included in this package allows you to prepare data, build a model, score a model and return results of model scoring. The routines include EXPLAIN which ranks attributes in order of influence in explaining a target column. PREDICT which predicts the value of a target attribute based on values in the input. PROFILE which generates rules that describe the cases from the input data.
Over the coming weeks I will have separate blog posts on each of these PL/SQL packages. These will cover the functions that are part of each packages and will include some examples of using the package and functions.
The PL/SQL API interface for Oracle Data Miner has had a number of new features. These are listed below along with the new API features added with the 11.1 release.
- Support for Native Transactional Data with Association Rules: you can build association rule models without first transforming the transactional data.
- SVM class weights specified with CLAS_WEIGHTS_TABLE_NAME: including the GLM class weights
- FORCE argument to DROP_MODEL: you can now force a drop model operation even if a serious system error has interrupted the model build process
- GET_MODEL_DETAILS_SVM has a new REVERSE_COEF parameter: you can obtain the transformed attribute coefficients used internally by an SVM model by setting the new REVERSE_COEF parameter to 1
11.1g API New Features
- Mining Model schema objects: previous releases, DM models were implemented as a collection of tables and metadata within the DMSYS schema. in 11.1 models are implemented as data dictionary objects in the SYS schema. A new set of DD views present DM models and their properties
- Automatic and Embedded Data Preparation: previously data preparation was the responsibility of the user. Now it can be automated
- Scoping of Nested Data: supports nested data types for both categorical and numerical data. Most algorithms require multi-record case data to the presented as columns of nested rows, each containing an attribute name/value pair. ODM processes each nested row as a separate attribute.
- Standardised Handling of Sparse Data & Missing Values: standardised across all algorithms.
- Generalised Linear Models: has a new algorithm and supports classification (logistic regression) and regression (linear regression)
- New SQL Data Mining Function: PREDICTION_BOUNDS has been introduced for Generalised Linear Models. This returns the confidence bounds on predicted values (regression models) or predicted probabilities (classification)
- Enhanced Support for Cost-Sensitive Decision Making: can be added or removed using DATA_MINING.ADD_COST_MATRIX and DBMS_DATA_MINING_REMOVE_COST_MATRIX.
The Predictive Analytics World conference is finishing up today in New York. Over the past few days the conference has had some of the leading analytic type people presenting at it.
Twitter, as usual, has been busy and there has been some very interesting and important quotes.
The list of tweets (#pawcon) below are the ones I found most interesting:
Manu Sharma from LinkedIn: “Guru” job title is down, “Ninja” is up.
Despite the “data science” buzz, the biggest skill among #pawcon attendees is ” #DataMining
Andrea Medinaceli: Visualization is very powerful for making analytics results accessible to upper management (and for buy-in)
Social Network Analytics (SNA) with Zynga, 20M daily active users, 90M monthly active users; 10K nodes, 45K edges (big!)
Vertica: Zynga is an analytics company in the disguise of a gaming company; graph analytics find users/influencers
Colin Shearer: Find me something interesting in my data is a question from hell (analysis should be guided by business goals)
John Elder advocates ensemble methods – usually improve analytics results
Tom Davenport: to get real value, #analytics need to move from one-time craft to industrialized activity
10 years from now all Fortune 500 companies will have a Chief Analytics Officer at the level of COO or CFO
Must be a sign of the economy, so much of the focus on the value of predictive is on retaining customers. #PAWCON.
Tom Davenport: #Analytics is not about math, it is about relationships (with your business client) – says Intel Chief Mathematician
Karl Rexer: companies with higher analytic capabilities are doing better than their peers
If you have been using Oracle Data Miner to develop your data mining workflows and models, at some point you will want to move away from the tool and start using the ODM APIs.
Oracle Data Mining provides a PL/SQL API and a Java API for creating supervised and unsupervised data mining models. The two APIs are fully interoperable, so that a model can be created with one API and then modified or applied using the other API.
I will cover the Java APIs in a later post, so watch out for that.
To help you get started with using the APIs there are a number of demo PL/SQL programs available. These were available as part of the the pre-11.2g version of the tool. But they don’t seem to packaged up with the 11.2 (SQL Developer 3) application.
The following table gives a list of the PL/SQL demo programs that are available. Although these were part of the pre-11.2g tool, they still seem to work on your 11.2g database.
You can download a zip of these files from here.
The sample PL/SQL programs illustrate each of the algorithms supported by Oracle Data Mining. They include examples of data transformations appropriate for each algorithm.
I will be exploring the main APIs, how to set them up, the parameters, etc., over the next few weeks, so check back for these posts.
Today I received two boxes, containing 48 books of
The Performance Management Revolution by Howard Dresner
These books have been kindly donated by Duncan Fitter, UK Business Development Director at Oracle.
I will be distributing these books to my MSc Data Mining students over the next week.
Thanks Duncan and Oracle
The new/updated SQL Developer 3.1 Early Adopter has just been released.
For the Data Miner, there are no major changes and it appears that there has been some bug fixes and some minor enhancements to so parts.
The main ODM features, apart from bug fixes, in this release include:
- Globalization support, including translated error messages and GUI for all languages supported by SQL Developer
- Improved accessibility features including the addition of a Structure navigator that lists all the nodes and links displayed in a workflow
Bug / Feature
After unzipping the download I opened SQL Developer. With each new release you will have to upgrade the existing ODM repository. The easiest way of doing this is to open the ODM connections pane and double click on one of your ODM schemas. SQL Developer will then run the necessary scripts to upgrade the repository.
I discovered a bug/feature with SQL Developer 3.1 EA1 upgrade script. The repository upgrade does not complete and an error is report.
I logged this error on the ODM forum on OTN. Mark Kelly who is the Development Manager for ODM and monitors the ODM forum, and his team, were quickly onto investigating the error. Mark has posted an update on the ODM form and give a script that needs to be run before you upgrade your existing repository.
You can download the pre-upgrade script from here.
If you don’t have an existing repository then you don’t have to run the script.
Check out the message on the ODM forum.
How to Upgrade SQL Developer & ODM
You will have to download the new SQL Developer 3.1 EA install files.
- Unzip this into your SQL Developer directory
- Create a shortcut for sqldeveloper.exe on your desktop and relabel it SQL Developer 3.1 EA
- Double-click this short cut
- You should be presented with the above window. Select the Yes button to migrate you previous install settings
- SQL Developer should now open and contains all your previous connections
If you have an existing ODM repository, you need to run the pre-upgrade script (see above) at this point
- You will now have to upgrade the ODM repository in the database. The simplest way of doing this is to allow SQL Developer to run the necessary scripts.
- From the View Menu, select Oracle Data Miner –> Connections
- In the ODM Connections pane double click one of your ODM schemas. Enter the username and password and click OK
- You will then be prompted to migrate/update the ODM repository to the new version. Click Yes.
- Enter the SYS username and Password
- Click Start button, to start the migrate/upgrade scripts
- On my laptop this migrate/upgrade step took less than 1 minute
- The upgrade is now finished and you can start using ODM.
ODM – SQL Developer 3.1 EA – Release Notes
The ODM release notes can be found at
Over the coming months (Q4 2011) there are a number of Oracle related events being run in Ireland. The schedule for these is below with the relevant links to the agenda webpages or to where you can book your place.
The OUG BI SIG meetings you can book your place with the UKOUG.
Venue Address – Dublin:
Oracle Block H, East Point Business Park, Dublin 3
Venue Address – Belfast:
The Mount Conference Center, 2 Woodstock Link, Belfast BT6 8DD
For questions about logistics please contact the marketing team on firstname.lastname@example.org
If you have any question about the content please contact: email@example.com
If you know of any other events that are not listed, let me know and I’ll update the list