Ora-lytics

Domain Knowledge + Data Skills = Data Miner

Posted on May 11, 2012

Over the past few weeks I have been talking to a lot of people who are looking at how data mining can be used in their organisation, for their projects and to people who have been doing data mining for a log time.

What comes across from talking to the experienced people, and these people are not tied to a particular product, is that you need to concentrate on the business problem. Once you have this well defined then you can drill down to the deeper levels of the project. Some of these levels will include what data is needed (not what data you have), tools, algorithms, etc.

Statistics is only a very small part of a data mining project. Some people who have PhDs in statistics who work in data mining say you do not use or very rarely use their statistics skills.

Some quotes that I like are:

“Focus hard on Business Question and the relevant target variable that captures the essence of the question.” Dean Abbott PAW Conf April 2012

“Find me something interesting in my data is a question from hell. Analysis should be guided by business goals.” Colin Shearer PAW Conf Oct 2011

There has need a lot of blog posting and articles on what are the key skills for a Data Miner and the more popular Data Scientist. What is very clear from all of these is that you will spend most of your time looking at, examining, integrating, manipulating, preparing, standardising and formatting the data. It has been quoted that all of these tasks can take up to 70% to 85% of a Data Mining/Data Scientist time. All of these tasks are commonly performed by database developers and in particular the developers and architects involved in Data Warehousing projects. The rest of the time for the running of the data mining algorithms, examining the results, and yes some stats too.

Every little time is spent developing algorithms!!! Why is this ? Would it be that the algorithms are already developed (for a long time now and are well turned) and available in all the data mining tools. We can almost treat these algorithms as a black box. So one of the key abilities of a data miner/data scientist would be to know what the algorithms can do, what kind of problems they can be used for, know what kind of outputs they produce, etc.

Domain knowledge is important, no matter how little it is, in preparing for and being involved in a data mining project. As we define our business problem the domain expert can bring their knowledge to the problem and allows us separate the domain related problems from the data related problems. So the domain expertise is critical at that start of a project, but the domain expertise is also critical when we have the outputs from the data mining algorithms. We can use the domain knowledge to tied the outputs from the data mining algorithms back to the original problem to bring real meaning to the original business problem we are working on.

So what is the formula of skill sets for a data mining or data scientist. Well it is a little like the title of this blog;

Domain Knowledge + Data Skills + Data Mining Skills + a little bit of Machine Learning + a little bit of Stats = a Data Miner / Data Scientist

This entry was posted in data mining, data mining blog, oracle big data, oracle data mining, oraclebigdata, oug_ire.

Oracle Magazine May/June 2012 Collector Editions

Posted on May 3, 2012

The good people at Oracle Magazine have produced a number of collectors editions (six) of the current edition (May/June 2012) .

I received my copy of the magazine in the post yesterday and the one that I received is the following

I’ve been collecting Oracle Magazine for over 20 year and I have almost the entire collection.

I would like to add all 6 special editions to my collection.

If you would like to donate your Oracle Magazine and help me complete the collection, add a comment to the blog or email me directly. I will be able to let you know what special editions I’m still missing

This entry was posted in Oracle, Oracle Magazine, Oracle Technology Network.

Oracle Magazine-May/June 1995

Posted on April 25, 2012

The headline articles for the May/June edition of Oracle Magazine included one of the first articles on Data Centers,using the prebuilt packages in PL/SQL and how to use object-oriented programming techniques in Oracle Forms 4.5

How big was your Oracle Database in 1993 & 1994

Posted on April 25, 2012

I’m in the middle of writing my summary of the May/June 1995 edition of Oracle Magazine (that blog post is coming soon). There was a one article about a survey that Oracle conducted of its customer on how big their databases were and the number of users for their databases.

The follow diagrams gives the summary results of these surveys.

We can see that there was a bit of a jump on the size of the databases but the number of users increased significantly

So must customers had databases in the 2GB to 10GB. How things have changed. If the survey was conduced for 2012 what results would be get ?

Does anyone know if Oracle has published similar survey results in the last few years ?

This entry was posted in database, Oracle, Oracle Magazine, OTN, oug_ire.

2 Day Oracle Data Miner course material

Posted on April 24, 2012

Last week I managed to get my hands on the training material for the 2 Day Oracle Data Miner course. This course is run by Oracle University.

Many thanks to Michael O’Callaghan who is a BI Sales person here in Ireland and Oracle University, for arranging this.

The 2 days are pretty packed with a mixture of lecture type material, lots of hands on exercises and some time for open discussions. In particular, day 2 will be very busy day.

Check out the course outline and published schedule – click here

You can have this course on site at your organisation. If this is something that interests you then contact your Oracle University account manager. There is also the traditional face-to-face delivery and the newer online delivery, where people from around the world come together for the online class.

This entry was posted in data mining, ODM 11g R2, Oracle, Oracle Advanced Analytics, Oracle Data Miner, oracle data mining, Oracle Data Mining 11g R2, oraclebigdata.

Oracle Analytics Sessions at COLLABORATE12

Posted on April 23, 2012

There are a number of Oracle Advanced Analytics and related topics taking place this week at COLLABORATE12 in Las Vegas (http://collaborate12.com).

Date	Time	Presentation	Presenter
Sun 22nd	9:00-3pm	Oracle Business Intelligence Application Journey
Mon 23rd	9:45-10:45	Managing Unstructured Data using Hadoop, Oracle 11g and Oracle Exadata Database Machine	Jim Steiner
Mon 23rd	9:45-10:45	Environmental Data Management and Analytics-a Real World Perspective	Angela Miller
Mon 23rd	11-12	Public Safety and Environmental Real-Time Analytics using Oracle Business Intelligence	Raghav Venkat Therese Arguelles
Mon 23rd	11-12	BI is more than slice and dice	Peter Scott
Mon 23rd	14:30-15:30	In-Database Analytics: Predictive Analytics, Data Mining, Exadata & Business Intelligence	Jacek Myczkowski
Mon 23rd	15:45-16:45	Big Data Analytics, R you ready	Mark Hornick Shyam Nath
Tues 24th	10:45-11:45	BI Analytics and Oracle NoSQL. The Future of Now	Manish Khera
Wed. 25th	8:15-9:15	Oracle Data Mining – A Component of the Oracle Advanced Analytics Option-Hands-on Lab	Charlie Berger
Wed 25th	9:30-10:30	Oracle R Enterprise – A Component of the Oracle Advanced Analytics Option-Hands-on Lab	Mark Hornick

Here are the abstracts from the two main Oracle Advanced Analytics presentations by Charlie Berger and Mark Hornick

Oracle Data Mining – A Component of the Oracle Advanced Analytics Option

This Hands-on Lab provides an introduction to Oracle Data Mining and the Oracle Data Miner GUI.

Oracle Data Mining (ODM), now part of Oracle Advanced Analytics, provides an extensive set of in-database data mining algorithms that solve a wide range of business problems. It can predict customer behavior, detect fraud, analyze market baskets, segment customers, and mine text to extract sentiments. ODM provides powerful data mining algorithms that run as native SQL functions for in-database model building and model deployment. There is no need for the time delays and security risks of data movement.

The free Oracle Data Miner GUI is an extension to Oracle SQL Developer 3.1 that enables data analysts to work directly with data inside the database, explore the data graphically, build and evaluate multiple data mining models, apply ODM models to new data, and deploy ODM’s predictions and insights throughout the enterprise. Oracle Data Miner work flows capture and document the user’s analytical methodology and can be saved and shared with others to automate advanced analytical methodologies.

Oracle R – A component of the Oracle Advanced Analytics Option

This Hands-on Lab provides an introduction to Oracle R Enterprise.

Oracle R Enterprise, a part of the Oracle Advanced Analytics Option, makes the open source R statistical programming language and environment ready for the enterprise by integrating R with Oracle Database. R users can interactively and transparently execute R scripts for statistical and graphical analyses on data stored in Oracle Database. R scripts can be executed in Oracle Database using potentially multiple database-managed R engines – resulting in data parallel execution. ORE also provides a rich set of statistical functions and advanced analytics techniques.

In this lab, attendees will be introduced to Oracle’s strategy for R, including the Oracle R Distribution, Oracle R Enterprise (ORE), and Oracle R Connector for Hadoop (ORCH). We will focus on Oracle R Enterprise with hands-on exercises exploring the transparency layer, embedded R execution, and statistics engine.

This entry was posted in Oracle, Oracle Advanced Analytics, oracle big data, Oracle Data Miner, oracle data mining, Oracle Data Mining 11g R2, Oracle R Enterprise, oraclebigdata.

Book Case

Posted on April 22, 2012

I finally got round to finishing a 5 foot high, by 4 foot wide, book case. It is made out of Sycamore wood with Mahogany wedges

Sycamore is a Very hard wood and can splinter easily.

One of the things that I like about this wood is when you apply danish oil you get variances in the colouration of the wood. I’ve ended up with some darker patches and some light patches. So you don’t get a consistent coloured finished.

This is one of the joys of working with natural solid wood rather than manufactured wood or veneered wood.

This is why I like it working with natural wood. It has lots of character. Plus the colouring will vary over the coming months.

When I moved book case into its new home in the dining room I discovered that the floor in not level. The book case was leaning away from the wall Sad smile This was easily fixed with a very small wedge placed under each end piece. This is not ideal.

This entry was posted in Uncategorized.

Oracle Magazine–March 1995

Posted on April 12, 2012

In 1995 we have a change to the frequency of publication of Oracle Magazine. It is not published every 2 two months with 6 editions each year, as it is still the case.

The headline articles in the March/April 1994 edition of Oracle Magazine included Integrating Unstructured Information, Minimizing Client/Server Network Traffic with Oracle Forms 4.0, Relational Objects and how the Canadian Postal Service was using Oracle Technology to deliver mail on time.

Data Visualization Videos & Resources

Posted on April 11, 2012

Here is a selection of videos and websites on Data Visualisations.

Hans Rosling videos of his TED talks

Oracle Advanced Analytics Video by Charlie Berger

Posted on April 10, 2012

Charlie Berger (Sr. Director Product Management, Data Mining & Advanced Analytics) as produced a video based on a recent presentation called ‘Oracle Advanced Analytics: Oracle R Enterprise & Oracle Data Mining’.

This is a 1 hour video, including some demos, of product background, product features, recent developments and new additions, examples of how Oracle is including Oracle Data Mining into their fusion applications, etc.

Oracle has 2 data mining products, with main in-database Oracle Data Mining and the more recent extensions to R to give us Oracle R Enterprise.

Check out the video – Click here.

Check out Charlie’s blog at https://blogs.oracle.com/datamining/

Oracle University : 2 Day Oracle Data Mining training course

This entry was posted in data mining, data mining blog, Oracle, Oracle Advanced Analytics, oracle big data, Oracle Data Miner, oracle data mining, Oracle Data Mining 11g R2, Oracle R Enterprise, oraclebigdata, oug_ire.

OTN Workshop Days in Dublin 17-

Posted on April 5, 2012

Oracle in Ireland have arranged a number of FREE Oracle Technology Network Hands on Workshops.

17th April : Database Firewall

18th April : Oracle Real Application Testing

19th April : Database 11g R2 New Features

20th April : Business Integration using Oracle SOA Suite 11g

All the workshops are in the Oracle offices in East Point, in Dublin.

To register for these events

http://www.oracle.com/us/dm/34862-splashpage-1438215.html

This entry was posted in database, Ireland, Oracle, Oracle Technology Network, OTN, oug_ire.

Tom Kyte is in Belfast 16th April

Posted on April 5, 2012

The Oracle User Group has organised for Tom Kyte the famous Oracle evangelist to come Belfast to give a one day seminar.

The seminar will be in the Hilton in Belfast.

Some of the topics to be covered on the day include:

5 things you probably didn’t know about SQL
5 thing you probably didn’t know about PL/SQL
All about metadata: why telling the database about your schema matters
What is New and Improved and Coming in Oracle Application Development
All about Oracle Database Security.

All of this will followed by a 1 hour Ask Tom session, where you will have your chance to ask the man himself anything about the Oracle database.

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: