cloud

Oracle ADW how to load new OML notebooks

Posted on Updated on

Oracle Autonomous Database (ADW) has been out a while now and have had several, behind the scenes, improvements and new/additional features added.

If you have used the Oracle Machine Learning (OML) component of ADW you will have seen the various sample OML Notebooks that come pre-loaded. These are easy to open, use and to try out the various OML features.

Screenshot 2019-07-29 13.07.01

The above image shows the top part of the login screen for OML. To see the available sample notebooks click on the Examples icon. When you do, you will get the following sample OML Notebooks.

Screenshot 2019-07-29 13.08.44

But what if you have a notebook you have used elsewhere. These can be exported in json format and loaded as a new notebook in OML.

To load a new notebook into OML, select the icon (three horizontal line) on the top left hand corner of the screen. Then select Notebooks from the menu.

Screenshot 2019-07-29 13.11.41           Screenshot 2019-07-29 13.21.07

Screenshot 2019-07-29 13.21.49

Then select the Import button located at the top of the Notebooks screen. This will open a File window, where you can select the json file from your file system.

Screenshot 2019-07-29 13.24.58

A couple of seconds later the notebook will be available and listed along side any other notebooks you may have created.

Screenshot 2019-07-29 13.26.13

All done!

You have now imported a new notebook into OML and can now use it to process your data and perform machine learning using the in-database features.

 

 

 

 

 

 

Advertisements

Machine Learning Tools and Workbenches

Posted on Updated on

The following is a list of the most commonly used tools and workbenches for machine learning. These are specific to machine learning only. This list does not include any library or frameworks. These are tools and workbenches only. Most offering machine learning tools will include the following features:

  • Easy drag and drop capabilities
  • Data collection
  • Data preparation and cleaning
  • Model building
  • Data Visualization
  • Model Deployment
  • Integration with other tools and languages

As more and more organizations implement machine learning, there are two core aims they want to achieve.

  1. Employee Productivity: Who wants to spend days or weeks writing mundane code to load data, clean data, etc etc etc. No one wants to do this and especially employers don’t want their staff wasting time on this. Instead they are happy to invest in tools and workbenches where a lot or most or all of these mundane tasks are automated for you. You can not concentrate on the important tasks of adding value to your organisation. This saves money, improves employee productivity and employee value.
  2. Integration with Technical Architecture: Many of these tools and workbenches allow for easy integration with the technical architecture and thereby allowing easy and quick integration of machine learning withe the day to day activities of the organization. This saves money, improves employee productivity and employee value.

SAS

SAS software has been around for every and is the great grand-daddy of analytics and machine learning. They have built a large number of machine learning tools and solutions built upon these for various industries. Their core machine learning tools include SAS Enterprise Miner and SAS Visual Data Mining and Machine Learning.

Microsoft

Microsoft have been improving their Machine Learning offering over the years and most of this is based on the Azure cloud platform with Microsoft Azure Machine Learning Studio and Azure Databricks.

SAP

SAP Leonardo is a cloud based platform for machine learning and supports tight integration with other SAP software.

Oracle

Oracle have a number of machine learning tools and supports for the main machine learning languages. They have built a large number of applications (both cloud and on-premises) with in-built machine learning. Their main tools for machine learning include Oracle Data Miner, Oracle Machine Learning and Oracle Analytics (OAC or DVD versions)

Cloudera

If you work with hadoop and big data then you are probably using Cloudera in some way. Cloudera have hired Hilary Mason as their GM of ML. By taking an “AI factory” approach to turning data into decisions, you can make the process of building, scaling, and deploying enterprise ML and AI solutions automated, repeatable, and predictable—boring even. Cloudera Data Science Workbench is their solution.

Screenshot 2019-04-17 13.10.46

IBM

IBM have a number of machine learning tools, one of them being a long standing member of the machine learning community, SPSS Modeler. Other machine learning tools include Watson Studio, IBM Machine Learning for z/OS, and IBM Watson Explorer.

Google

Google have a large number of machine learning solutions including everything from traditional machine learning, into NLP, in Image processing, Video processing, etc. It’s a long list. Many of these come with various APIs to access these features. Most of these revolve around their Google AI Cloud offering. But sticking with the tools and workbenches we have AI Platform Notebooks, Kubeflow, and BigQuery ML.

TensorBoard

TensorBoard is a suite of tools for graphical representation of different aspects and stages of machine learning in TensorFlow.

Amazon

A bit like Goolge, Amazon has a large number of solutions for machine learning and AI, and most of these are available via an API or some cloud service. Amazon SageMaker is their main service.

Looker

Looker connects directly with Google BQML reduces additional complexity for data scientists by eliminating the need to move outputs of predictive models back into the database for use, while also increases the time-to-value for business users, allowing them to operationalize the outputs of predictive metrics to make better decisions every day.

Weka

Weka has been around for a long time and still popular in some research groups. Weka is a collection of machine learning algorithms for data mining tasks. It contains tools for data preparation, classification, regression, clustering, association rules mining, and visualization.

RapidMiner

RapidMiner Studio has been around for a long time and is one of the few more visual workflow tools (that everyone else should be doing).

Databricks

From the people who created Spark, we have another notebook solution for your machine learning projects called Databricks Workbench.

KNIME

KNIME Analytics Platform is the open source software for creating data science applications and services.

Dataiku

Dataiki Data Science (DSS) is a collaborative data science software workflow platform enabling data exploration, prototyping and delivery of analytical and machine learning solutions.

 

I’ve not included the tools like R Studio and Notebooks in this list as they don’t really address the aims listed above. But you will notice a lot of the above solutions are really Jupyter Notebooks. Most of these vendors have a long way to go to make the tasks of machine learning boring.

This list does not cover all available tools and workbenches, but it does list the most common one you will come across.

Time Series Forecasting in Oracle – Part 2

Posted on Updated on

This is the second part about time-series data modeling using Oracle. Check out the first part here.

In this post I will take a time-series data set and using the in-database time-series functions model the data, that in turn can be used for predicting future values and trends.

The data set used in these examples is the Rossmann Store Sales data set. It is available on Kaggle and was used in one of their competitions.

Let’s start by aggregating the data to monthly level. We get.

Screenshot 2019-04-16 12.37.59

Data Set-up

Although not strictly necessary, but it can be useful to create a subset of your time-series data to only contain the time related attribute and the attribute containing the data to model. When working with time-series data, the exponential smoothing function expects the time attribute to be of DATE data type. In most cases it does. When it is a DATE, the function will know how to process this and all you need to do is to tell the function the interval.

A view is created to contain the monthly aggregated data.

-- Create input time series
create or replace view demo_ts_data as 
select to_date(to_char(sales_date, 'MON-RRRR'),'MON-RRRR') sales_date,
sum(sales_amt) sales_amt
from demo_time_series
group by to_char(sales_date, 'MON-RRRR')
order by 1 asc;

Next a table is needed to contain the various settings for the exponential smoothing function.

CREATE TABLE demo_ts_settings(setting_name VARCHAR2(30), 
                              setting_value VARCHAR2(128));

Some care is needed with selecting the parameters and their settings as not all combinations can be used.

Example 1 – Holt-Winters

The first example is to create a Holt-Winters time-series model for hour data set. For this we need to set the parameter to include defining the algorithm name, the specific time-series model to use (exsm_holt), the type/size of interval (monthly) and the number of predictions to make into the future, pass the last data point.

BEGIN
   -- delete previous setttings
   delete from demo_ts_settings;

   -- set ESM as the algorithm
   insert into demo_ts_settings 
      values (dbms_data_mining.algo_name,
              dbms_data_mining.algo_exponential_smoothing);

   -- set ESM model to be Holt-Winters
   insert into demo_ts_settings 
      values (dbms_data_mining.exsm_model,
              dbms_data_mining.exsm_holt);

   -- set interval to be month
   insert into demo_ts_settings 
      values (dbms_data_mining.exsm_interval,
              dbms_data_mining.exsm_interval_month);

   -- set prediction to 4 steps ahead
   insert into demo_ts_settings 
      values (dbms_data_mining.exsm_prediction_step,
              '4');

   commit; 
END;

Now we can call the function, generate the model and produce the predicted values.

BEGIN
   -- delete the previous model with the same name
   BEGIN 
      dbms_data_mining.drop_model('DEMO_TS_MODEL');
   EXCEPTION 
      WHEN others THEN null; 
   END;

   dbms_data_mining.create_model(model_name => 'DEMO_TS_MODEL',
                                 mining_function => 'TIME_SERIES',
                                 data_table_name => 'DEMO_TS_DATA',
                                 case_id_column_name => 'SALES_DATE',
                                 target_column_name => 'SALES_AMT',
                                 settings_table_name => 'DEMO_TS_SETTINGS');
END;

When the model is create a number of data dictionary views are populated with model details and some addition views are created specific to the model. One such view commences with DM$VP. Views commencing with this contain the predicted values for our time-series model. You need to append the name of the model create, in our example DEMO_TS_MODEL.

-- get predictions
select case_id, value, prediction, lower, upper 
from   DM$VPDEMO_TS_MODEL
order by case_id;

Screenshot 2019-04-16 16.01.14

When we plot this data we get.

Screenshot 2019-04-16 16.02.57

The blue line contains the original data values and the red line contains the predicted values. The predictions are very similar to those produced using Holt-Winters in Python.

Screenshot 2019-04-16 16.04.45

Example 2 – Holt-Winters including Seasonality

The previous example didn’t really include seasonality int the model and predictions. In this example we introduce seasonality to allow the model to pick up any trends in the data based on a defined period.

For this example we will change the model name to HW_ADDSEA, and the season size to 5 units. A data set with a longer time period would illustrate the different seasons better but this gives you an idea.

BEGIN
   -- delete previous setttings
   delete from demo_ts_settings;

   -- select ESM as the algorithm
   insert into demo_ts_settings 
   values (dbms_data_mining.algo_name,
           dbms_data_mining.algo_exponential_smoothing);

   -- set ESM model to be Holt-Winters Seasonal Adjusted
   insert into demo_ts_settings 
   values (dbms_data_mining.exsm_model,
           dbms_data_mining.exsm_HW_ADDSEA);

   -- set interval to be month
   insert into demo_ts_settings 
   values (dbms_data_mining.exsm_interval,
   dbms_data_mining.exsm_interval_month);

  -- set prediction to 4 steps ahead
  insert into demo_ts_settings 
  values (dbms_data_mining.exsm_prediction_step,
          '4');

   -- set seasonal cycle to be 5 quarters
   insert into demo_ts_settings 
   values (dbms_data_mining.exsm_seasonality,
           '5');

commit; 
END;

We need to re-run the creation of the model and produce the predicted values. This code is unchanged from the previous example.

BEGIN
   -- delete the previous model with the same name
   BEGIN 
      dbms_data_mining.drop_model('DEMO_TS_MODEL');
   EXCEPTION 
      WHEN others THEN null; 
   END;

   dbms_data_mining.create_model(model_name => 'DEMO_TS_MODEL',
                                 mining_function => 'TIME_SERIES',
                                 data_table_name => 'DEMO_TS_DATA',
                                 case_id_column_name => 'SALES_DATE',
                                 target_column_name => 'SALES_AMT',
                                 settings_table_name => 'DEMO_TS_SETTINGS');
END;

When we re-query the DM$VPDEMO_TS_MODEL we get the new values. When plotted we get.

Screenshot 2019-04-16 16.17.30

The blue line contains the original data values and the red line contains the predicted values.

Comparing this chart to the chart from the first example we can see there are some important differences between them. These differences are particularly evident in the second half of the chart, on the right hand side. We get to see there is a clearer dip in the predicted data. This mirrors the real data values better. We also see better predictions as the time line moves to the end.

When performing time-series analysis you really need to spend some time exploring the data, to understand what is happening, visualizing the data, seeing if you can identifying any patterns, before moving onto using the different models. Similarly you will need to explore the various time-series models available and the parameters, to see what works for your data and follow the patterns in your data. There is not magic solution in this case.