Oracle OCI AI Services

Posted on Updated on

Oracle Cloud have been introducing new AI Services over the past few months, and we see a few more appearing over the coming few months. When you look at the list you might be a little surprised that these are newly available cloud services from Oracle. You might be surprised for two main reasons. Firstly, AWS and Google have similar cloud services available for some time (many years) now, and secondly, Oracle started talking about having these cloud services many years ago. It has taken some time for these to become publicly available. Although some of these have been included in other applications and offerings from Oracle, so probably they were busy with those before making them available as stand alone services.

These can be located in your Oracle Cloud account from the hamburger menu, as shown below

As you can see most of these AI Services are listed, except for the OCI Forecasting, which is due to be added “soon”. We can also expect to have an OCI Translation services and possibly some additional ones.

  • OCI Language: This services can work with over 75 languages and allows you to detect and perform knowledge extraction from the text to include entity identification and labelling, classification of text into more than 600 categories, sentiment analysis and key phrase extraction. This can be used automate knolwedge extraction from customer feedback, product reviews, discussion forums, legal documents, etc
  • OCI Speech: Performs Speech to Text, from live streaming of speech, audio and video recordings, etc creating a transcription. It works across English, Spanish and Portuguese, with other languages to be added. A nice little feature includes Profanity filtering, allowing you to tag, remove or mask certain words
  • OCI Vision: This has two parts. The first is for processing documents, and is slightly different to OCI Language Service, in that this service looks at processing text documents in jpeg, pdf, png and tiff formats. Text information extraction is performed identifying keep terms, tables, meta-data extraction, table extraction etc. The second part of OCI Vision deals with image analysis and extracting key information from the image such as objects, people, text, image classification, scene detection, etc. You can use either the pretrained models or include your own models.
  • OCI Anomaly Detection: Although anomaly detection is available via algorithms in the Database and OCI Data Science offerings, this new services allow for someone with little programming experience to utilise an ensemble of models, including the MSET algorithm, to provide greater accuracy with identifying unusual patterns in the data.

Note: I’ve excluded some services from the above list as these have been available for some time now or have limited AI features included in them. These include OCI Data Labelling, OCI Digital Assistant.

Some of these AI Services, based on the initial release, have limited functionality and resources, but this will change over time.

Python Data Profiling libraries

Posted on Updated on

One of the most common, and sometimes boring, task when working with datasets is writing some code to profile the data. Most data scientists will have built a set of tools/scripts to help them with this regular and slightly boring task. As with most IT tasks we should be trying to automate what we can, to allow us to spend more time on more important tasks, such as deriving insights and delivering value to the business, instead of repeatedly writing code to produce various statistics about the data and drawing pretty pictures.

I’ve written previously about automating and using some data profiling libraries to help us with this task. There are lots of packages available on pypi.og and on GitHub. Below I give examples of 5 Python Data Profiling libraries, with links to their GitHubs.

  1. pandas_profiling

This is probably one of the better and more popular Python libraries for exploring data. The aim is to make it as simple as possible using one line of code.

import pandas_profiling as pp

df2.profile_report()

2. skimpy

Following the line line of code approach skimpy is a light weight tool that provides summary statistics about variables in data frames. They like to thing skimpy is a super-charged version of df.describe(). Skimpy also has some automated data cleaning functions.

from skimpy import skim

skim(df)

3. dataprep

Dataprep has multiple features with the two main features being EDA (Exploratory Data Analysis) and Data Cleaning. For EDA functionality, it is build to scale for larger data sets and provides some interactive charts.

from dataprep.eda import *
from dataprep.datasets import load_dataset
from dataprep.eda import plot, plot_correlation, plot_missing, plot_diff, create_report

df = load_dataset("titanic")
plot(df)

plot_missing(df)
plot_missing(df, "Age")

4. SweetViz

Sweetviz creates high-density visualizations to help kickstart EDA with just two lines of code. Output is a fully self-contained HTML application.

import pandas as pd
import sweetviz as sv

df = pd.read_csv('../input/titanic/train.csv')
report = sweetviz.analyze(df, "Survived")

5. AutoViz

Autoviz works on visualizing the relationship of the data, it can find the most impactful features and plot creative visualization.

from autoviz.AutoViz_Class import AutoViz_Class
AV = AutoViz_Class()

df = AV.AutoViz('titanic_train.csv')

Always try to automate the boring tasks, and using one of these packages is a step towards doing for for any Data Analysts, Data Sciences, Data Engineers, Machine Learning Engineer, AI Engineer, etc.

NATO AI Strategy

Posted on Updated on

Over the past 18 months there has been wide spread push buy many countries and geographic regions, to examine how the creation and use of Artificial Intelligence (AI) can be regulated. I’ve written many blog posts about these. But it isn’t just government or political alliances that are doing this, other types of organisations are also doing so.

NATO, the political and (mainly) military alliance, has also joined the club. They have release a summary version of their AI Strategy. This might seem a little strange for this type of organisation to do something like this. But if you look a little closer NATA also says they work together in other areas such as Standardisation Agreements, Crisis Management, Disarmament, Energy Security, Clime/Environment Change, Gender and Human Security, Science and Technology.

In October/November 2021, NATO formally adopted their Artificial Intelligence (AI) Strategy (for defence). Their AI Strategy outlines how AI can be applied to defence and security in a protected and ethical way (interesting wording). Their aim is to position NATO as a leader of AI adoption, and it provides a common policy basis to support the adoption of AI System sin order to achieve the Alliances three core tasks of Collective Defence, Crisis Management and Cooperative Security. An important element of the AI Strategy is to ensure inter-operability and standardisation. This is a little bit more interesting and perhaps has a lessor focus on ethical use.

NATO’s AI Strategy contains the following principles of Responsible use of AI (in defence):

  • Lawfulness: AI applications will be developed and used in accordance with national and international law, including international humanitarian law and human rights law, as applicable.
  • Responsibility and Accountability: AI applications will be developed and used with appropriate levels of judgment and care; clear human responsibility shall apply in order to ensure accountability.
  • Explainability and Traceability: AI applications will be appropriately understandable and transparent, including through the use of review methodologies, sources, and procedures. This includes verification, assessment and validation mechanisms at either a NATO and/or national level.
  • Reliability: AI applications will have explicit, well-defined use cases. The safety, security, and robustness of such capabilities will be subject to testing and assurance within those use cases across their entire life cycle, including through established NATO and/or national certification procedures.
  • Governability: AI applications will be developed and used according to their intended functions and will allow for: appropriate human-machine interaction; the ability to detect and avoid unintended consequences; and the ability to take steps, such as disengagement or deactivation of systems, when such systems demonstrate unintended behaviour.
  • Bias Mitigation: Proactive steps will be taken to minimise any unintended bias in the development and use of AI applications and in data sets.

By acting collectively members of NATO will ensure a continued focus on interoperability and the development of common standards.

Some points of interest:

  • Bias Mitigation efforts will be adopted with the aim of minimising discrimination against traits such as gender, ethnicity or personal attributes. However, the strategy does not say how bias will be tackled – which requires structural changes which go well beyond the use of appropriate training data.
  • The strategy also recognises that in due course AI technologies are likely to become widely available, and may be put to malicious uses by both state and non-state actors. NATO’s strategy states that the alliance will aim to identify and safeguard against the threats from malicious use of AI, although again no detail is given on how this will be done.
  • Running through the strategy is the idea of interoperability – the desire for different systems to be able to work with each other across NATO’s different forces and nations without any restrictions.
  • What about Autonomous weapon systems?  Some members do not support a ban on this technology.
  • Has similar wording to the principles adopted by the US Department of Defense for the ethical use of AI.
  • Wants to make defence and security a more attractive to private sector and academic AI developers/researchers.
  • NATO principles have no coherent means of implementation or enforcement.

AI Sandboxes – EU AI Regulations

Posted on Updated on

The EU AI Regulations provides a framework for placing on the market and putting into service AI system in the EU. One of the biggest challenges most organisations will face will be how they can innovate and develop new AI systems while at the same time ensuring they are compliant with the regulations. But a what point do you know you are compliant with these new AI Systems? This can be challenging and could limit or slow down the development and deployment of such systems.

The EU does not want to limit or slow down such innovations and want organisations to continually research, develop and deploy new AI. To facilitate this the EU AI Regulations contains a structure under which this can be achieved.

Section or Title of EU AI Regulations contains Articles 53, 54, and 55 to support the development of new AI systems by the use of Sandboxes. We have already seen examples of these being introduced by the UK and Norwegian Data Protection Commissioners.

A Sandbox “provides a controlled environment that facilitates the development, testing and validation of innovative AI systems for a limited time before their placement on the market or putting into
service pursuant to a specific plan.

Sandboxes are stand alone environments to allow the exploration and development of new AI solutions, which may or may not include some risky use of customer data or other potential AI outcomes which may not be allowed under the regulations. It becomes a controlled experiment lab for the AI team who are developing and testing a potential AI System and can do so under real world conditions. The Sandbox gives a “safe” environment for this experimental work.

The Sandbox are to be established by the Competent Authorities in each EU country. In Ireland the Competent Authority seems to be the Data Protection Commissioner, and this may be similar in other countries. As you can imagine, under the current wording of the EU AI Regulations this might present some challenges for the both the Competent Authority and also for the company looking to develop an AI solution. Firstly, does the Competent Authority need to provide sandboxes for all companies looking to develop AI, and each one of these companies may have several AI projects. This is a massive overhead for the Competent Authority to provide and resource. Secondly, will companies be willing to setup a self-contained environment, containing customer data, data insights, solutions with potential competitive advantage, etc in a Sandbox provided by the Competent Authority. The technical infrastructure used could be hosting many Sandboxes, with many competing companies using the same infrastructure at the same time. This is a big ask for the companies and the Competent Authority.

Let’s see what really happens regarding the implementation of the Sandboxes over the coming years, and how this will be defined in the final draft of the Regulations.

Article 54 defines additional requirements for the processing of personal data within the Sandbox.

  • Personal Data being used is required, and can be fulfilled by processing anonymized, synthetic or other non-personal data. Even if it has been collected for other purposes.
  • Continuous monitoring needed to identify any high risk to fundamental rights of the data subject, and response mechanism to mitigate those risks.
  • Any personal data to be processed is in a functionally separate, isolated and protected data processing environment under the control of the participants and only authorised persons have access to that data.
  • Any personal data processed are not be transmitted, transferred or otherwise accessed by other parties.
  • Any processing of personal data does not lead to measures or decisions affecting the data subjects.
  • All personal data is deleted once the participation in the sandbox is terminated or the personal data has reached the end of its retention period.
  • Full documentation of what was done to the data, must be kept for 1 year after termination of Sandbox, and only to be used for accountability and documentation obligations.
  • Documentation of the complete process and rationale behind the training, testing and validation of AI, along with test results as part of technical documentation. (see Annex IV)
  • Short Summary of AI project, its objectives and expected results published on website of Competent Authorities

Based on the last bullet point the Competent Authority is required to write am annual report and submit this report to the EU AI Board. The report is to include details on the results of their scheme, good and bad practices, lessons learnt and recommendations on the setup and application of the Regulations within the Sandboxes.

OCED Framework for Classifying of AI Systems

Posted on Updated on

Over the past few months we have seen more and more countries looking at how they can support and regulate the use and development of AI within their geographic areas. For those in Europe, a lot of focus has been on the draft AI Regulations. At the time of writing this post there has been a lot of politics going on in relation to the EU AI Regulations. Some of this has been around the definition of AI, what will be included and excluded in their different categories, who will be policing and enforcing the regulations, among lots of other things. We could end up with a very different set of regulations to what was included in the draft (published April 2021). It also looks like the enactment of the EU AI Regulations will be delayed to the end of 2022, with some people suggesting it would be towards mid-2023 before something formal happens.

I mentioned above one of the things that may or may not change is the definition of AI within the EU AI Regulations. Although primarily focused on the inclusion/exclusion of biometic aspects, there are other refinements being proposed. When you look at what other geographic regions are doing, we start to see some common aspects on their definitions of AI, but we also see some differences. You can imagine the difficulties this will present in the global marketplace and how AI touches upon all/many aspects of most businesses, their customers and their data.

Most of you will have heard of OCED. In recent weeks they have been work across all member countries to work towards a Definition of AI and how different AI systems can be classified. They have called this their OCED Framework for Classifying of AI Systems.

The OCED Framework for Classifying AI System is a tool for policy-makers, regulators, legislators and others so that they can assess the opportunities and risks that different types of AI systems present and to inform their national AI strategies.

The Framework links the technical characteristics of AI with the policy implications set out in the OCED AI Principles which include:

  • Inclusive growth, sustainable development and well-being
  • Human-centred values and fairness
  • Transparency and explainability
  • Robustness, security and safety
  • Accountability

The framework looks are different aspects depending on if the AI is still within the lab (sandbox) environment or is live in production or in use in the field.

The framework goes into more detail on the various aspects that need to be considered for each of these. The working group have apply the frame work to a number of different AI systems to illustrate how it cab be used.

Check out the framework document where it goes into more detail of each of the criterion listed above for each dimension of the framework.

AutoML using Pycaret

Posted on

In this post we will have a look at using the AutoML feature in the Pycaret Python library. AutoML is a popular topic and allows Data Scientists and Machine Learning people to develop potentially optimized models based on their data. All requiring the minimum of input from the Data Scientist. As with all AutoML solutions, care is needed on the eventual use of these models. With various ML and AI Legal requirements around the World, it might not be possible to use the output from AutoML in production. But instead, gives the Data Scientists guidance on creating an optimized model, which can then be deployed in production. This facilitates requirements around model explainability, transparency, human oversight, fairness, risk mitigation and human in the loop.

Some useful links

Pycaret as all your typical Machine Learning algorithms and functions, including for classification, regression, clustering, anomaly detection, time series analysis, and so on.

To install Pycaret run the typical pip command

pip3 install pycaret

If you get any error messages when running any of the following example code, you might need to have a look at your certificates. Locate where Python is installed (for me on a Mac /Applications/Python 3.7) and you will find a command called ‘Install Certificates.command’. and run the following in the Python directory. This should fix what is causing the errors.

Pycaret comes with some datasets. Most of these are the typical introduction datasets you will find in other Python libraries and in various dataset repositories. For our example we are going to use the Customer Credit dataset. This contains data for a classification problem and the aim is to predict customers who are likely to default.

Let’s load the data and have a quick explore

#Don't forget to install Pycaret
#pip3 install pycaret

#Import dataset from Pycaret
from pycaret.datasets import get_data

#Credit defaulters dataset
df = get_data("credit")

The dataframe is displayed for the first five records

What’s the shape of the dataframe? The dataset/frame has 24,000 records and 24 columns.

#Check for the shape of the dataset
df.shape

(24000, 24)

The dataset has been formatted for a Classification problem with the column ‘default’ being the target or response variable. Let’s have a look at the distribution of records across each value in the ‘default’ column.

df['default'].value_counts()

0    18694
1     5306

And to get the percentage of these distributions,

df['default'].value_counts(normalize=True)*100

0    77.891667
1    22.108333

Before we can call the AutoML function, we need to create our Training and Test datasets.

#Initialize seed for random generators and reproducibility
seed = 42

#Create the train set using pandas sampling - seen data set
train = df.sample(frac=.8, random_state=seed)
train.reset_index(inplace=True, drop=True)
print(train.shape)
train['default'].value_counts()

(19200, 24)

0    14992
1     4208

Now the Test dataset.

#Using samples not available in train as future or unseen data set
test = df.drop(train.index)
test.reset_index(inplace=True, drop=True)
print(test.shape)
test['default'].value_counts()

(4800, 24)

0    3798
1    1002

Next we need to setup and configure the AutoML experiment.

#Let's Do some magic!
from pycaret.classification import *

#Setup function initializes the environment and creates the transformation pipeline
clf = setup(data=train, target="default", session_id=42)

When the above is run, it goes through a number of steps. The first looks at the dataset, the columns and determines the data types, displaying the following.

If everything is correct, press the enter key to confirm the datatypes, otherwise type ‘quit‘. If you press enter Pycaret will complete the setup of the experiments it will perform to identify a model. A subset of the 60 settings is shown below.

The next step runs the experiments to compare each of the models (AutoML), evaluates them and then prints out a league table of models with values for various model evaluation measures. 5.-Fold cross validation is used for each model. This league table is updated are each model is created and evaluated.

# Compares different models depending on their performance metrics. By default sorted by accuracy
best_model = compare_models(fold=5)

For this dataset, this process of comparing the models (AutoML) only takes a few seconds. The constant updating of the league tables is a nice touch. The following shows the final league table created for our AutoML.

The cells colored/highlighted in Yellow tells you which model scored based for that particular evaluation matrix. Here we can see Ridge Classifier scored best using Accuracy and Precision. While the Linear Discriminant Analysis model was best using F1 score, Kappa and MCC.

print(best_model)

RidgeClassifier(alpha=1.0, class_weight=None, copy_X=True, fit_intercept=True,
                max_iter=None, normalize=False, random_state=42, solver='auto',
                tol=0.001)

We can also print the ROC chart.

# Plots the AUC curve
import matplotlib.pyplot as plt

fig = plt.figure()
plt.figure(figsize = (14,10))
plot_model(best_model, plot="auc", scale=1)

Also the confusion matrix.

plot_model(best_model, plot="confusion_matrix")

We can also see what the top features are that contribute to the model outcomes (the predictions). This is also referred to as feature importance.

plot_model(best_model, plot="feature")

We could take one of these particular models and tune it for a better fit, or we could select the ‘best’ model and tune it.

# Tune model function performs a grid search to identify the best parameters
tuned = tune_model(best_model)

We can now use the tuned model to label the Test dataset and compare the results.

# Predict on holdout set
predict_model(tuned, data=test)

The final steps with all models is to save it for later use. Pycaret allows you to save the model in .pkl file format

# Model will be saved as .pkl and can be utilized for serving
save_model(tuned,'Tuned-Model-AutoML-Pycaret')

That’s it. All done.

Valintine’s Day SQL

Posted on Updated on

Well today is February 14th and is know as (St.) Valintine’s Day. Here is a piece of SQL I just put together to mark today. Enjoy and Happy St. Valintine’s Day.

WITH heart_top(lev, love) AS (
   SELECT 1 lev, RPAD(' ', 7, ' ') || '**          **' love
   FROM dual
   UNION ALL
   SELECT heart_top.lev+1, 
          RPAD(' ', 6-heart_top.lev*2, ' ') || 
          RPAD('*', (heart_top.lev*4)+2, '*') || 
          RPAD(' ', 11-heart_top.lev*3, ' ') || 
          RPAD('*', (heart_top.lev*4)+2, '*')  love
   FROM heart_top
   WHERE heart_top.lev < 4
),
   heart_bottom(lev, love) AS (
     SELECT 1 lev, '******************************' love
     FROM dual
     UNION ALL
     SELECT heart_bottom.lev+1, 
          RPAD(' ', heart_bottom.lev*2, ' ') || 
          RPAD('*', 15-heart_bottom.lev*2, '*') || 
          RPAD('*', 15-heart_bottom.lev*2, '*')  love
     FROM heart_bottom
     WHERE heart_bottom.lev < 8
)
SELECT love FROM heart_top
union all
SELECT love FROM heart_bottom;

Which gives us the following.

Bid you know : St. Valentine, the patron saint of love, was executed in Rome and buried there in the 3rd century. In 1835, an Irish priest was granted permission to exhume his remains, and now his skeleton lies in Whitefriar Church in Dublin city center.

Oracle on AWS costs

Posted on Updated on

In a previous post I walked through the steps of setting up an Oracle Database on AWS RDS. It was a very simple and straight forward process. The only thing to watch out for was to open the network to allow traffic in and out. I also showed how to connect SQL Developer to that database.

I’ve been using it for a few days and needed to move onto other things for a few days. I could leave the Database up and running during this period or I could shut down the Database to save a few dollars/euro. It also gave me a chance to see how much this database cloud instance is costing me. In my previous post, it was estimated to cost about 0.89c per day.

Before we look at the Actual/Real costs, let’s walk through the steps of shutting down the database.

To stop the database, click on the Actions button on the top right hand side of the screen, just above the database summary details. You will get a confirmation window/box appearing, see image below, asking you to confirm by clicking ‘Yes, Stop Now’.

It will take a few minutes for this shutdown to complete and in my case it took approx. 8 minutes, which was a little surprising as no one was using it at the time. You might need to refresh the webpage to see this change.

That’s all very simple, but it does give you a warning about the stopped database instance. It will be restarted in 7 days time! So if this is a database you will occasionally use, then you will need to carefully manage this particular feature, otherwise you will end up with the database automatically starting and you will be paying for this.

What about the Costs?

The costs for running this service can be found in the AWS Cost Management page. Here we can see the database was running for 7 and a bit days before I shut it down, and we can see the daily cost was 0.82c. Two things note about these costs. There was larger cost for the first day. Most of this cost was associated with the setup and configuration of the database service. The second thing to note is the costs listed in this console do not include taxes.

A got the bill for this usage, and it came to $6.94, consisting of $5.64 for usage (approx. 75c per day) and $1.30 in taxes/vat. Not a lot considering some cloud services, but comes out at approx 92.5c per day, which is a little more than the estimated cost when the service was being created. A small example of what can happen between the “in theory” cost of cloud versus the actual costs.

AWS RDS Oracle setup

Posted on Updated on

There are lots of options available to you for creating and using an Oracle Database.

One of these options is to use AWS RDS services to create and host a Database.

Warning: Using AWS is a paid service and the RDS options are available based on the size of the server you pick.  The example show in below will cost approx 89c per day or $27 per month. For this the database will be running 24×7. You could reduce this cost significantly by only starting/stopping the Database when you need it, or alternatively create an AWS lamda function service to start/stop.

First thing you need to do is go create an AWS account, and yes you will need to hand over your credit card number.

After creating your account and you have logged in, search for RDS and you will get the following display. Click on the orange button at the top of the page to Create Database.

Then select

  • Standard create
  • Oracle
  • Architecture settings -> Use multitenant architecture
  • Oracle Enterprise Edition
  • Version -> use the drop down and select latest version (in my case 21)
  • Templates -> Dev/Test
  • Instance Identifier -> database-1
  • Master Username -> admin
  • Master password -> <set password> and confirm it
  • DB Instance Class – as we only want a DB for playing with, go with the cheapest -> db.t3.small (Hint: Scroll to bottom of page to see the estimated monthly costs)
  • Storage type -> General Purpose SSD. If you change this to Magnetic, you will see the cost drop by approx $2 per month. I selected General Purpose SSD
  • Allocated Storage -> I set this to 20G (it’s just a small play DB)
  • Disable/un-tick – Entable Storage Autoscaling
  • Select defaults for VPC (Virtual Private Cloud) – see notes later on opening this to allow connection from your computer.
  • Public Access – Set to Yes
  • Defaults for remaining options.

[Note: You might be prompted to enter a DB Name. Keep this short, with no special characters.]

Click on ‘Create Database’ button at bottom of screen to create the database. It can take anything from a couple of minutes to 30 minutes to create the Database.

When everything is create, and you try to connect to the Database using SQL Developer, you will not be able to connect. The VPC needs to be opened to outside traffic. Click on the VPC Security Groups link, then click on the Security Group link on the next page

Then click on the ‘Edit Inbound rules’ button, and then on the ‘Add Rule’ button (bottom left) to add a new rule. Then select ‘All Traffic’ from drop down, and 0.0.0.0/0 in the Source field. Then save the rules.

You are now ready to create a connection using SQL Developer. To do this you will need database Endpoint from the RDS dashboard. You will also need the DB Name. This can be found by clicking on the ‘Configuration’ tab, and is listed on left-hand side under DB Name

Now in SQL Developer enter those details and click Test button to see if the connection works. It should! but if it doesn’t then double check the username and password, the other details entered, and the network changes made above are correct.

You can now connect and start using the Database.

Warning: You will be connecting as the ADMIN for the Database. You should never use this account for any development work. So go create a new database user/schema and use it for all your work.

Database Vendors on Twitter, Slack, downloads, etc.

Posted on Updated on

Each year we see some changes in the positioning of the most popular databases on the market. “The most popular” part of that sentence can be the most difficult to judge. There are lots and lots of different opinions on this and ways of judging them. There are various sites giving league tables, and even with those some people don’t agree with how they perform their rankings.

The following table contains links for some of the main Database engines including download pages, social media links, community support sites and to the documentation.

Database VendorDownload PageCloud ServiceTwitterSlack/DiscordStack OverflowDocumentation
OracleDownloadCloud Service@OracleDatabaseStack OverflowDocumentation
MySQLDownloadCloud Service@MySQLSlackStack OverflowDocumentation
Microsoft SQL ServerDownloadCloud Service@SQLServerSlackStack OverflowDocumentation
PostgreSQLDownloadLots of Vendors @PostgreSQLSlackStack OverflowDocumentation
MongoDBDownloadCloud Service@MongoDBSlackStack OverflowDocumentation
RedisDownloadCloud Service@RedisincDiscordStack OverflowDocumentation
IBM DB2DownloadCloud Service@Db2zLabNewsStack OverflowDocumentation
CassandraDownloadLots of Vendors@cassandraSlackStack OverflowDocumentation
MariaDBDownloadCloud Service@MariaDBSlackStack OverflowDocumentation
SnowflakeN/ACloud Service@SnowflakeDBStack OverflowDocumentation

One of the most common sites is DB-Engines, and another is TOPDB Top Database index. The images below show the current rankings/positions of the database vendors (in January 2022).

I’ve previously written about using the Python pytrends package to explore the relative importance of the different Database engines. The results from pytrends gives results based on number of searches etc in Google. Check out that Blog Post. I’ve rerun the same code for 2021, and the following gallery displays charts for each Database based on their popularity. This will allow you to see what countries are most popular for each Database and how that relates to the other databases. For these charts I’ve included Oracle, MySQL, SQL Server, PostgreSQL and MongoDB, as these are the top 5 Databases from DB-Engines.

Using SQL to create some festive Christmas Trees

Posted on Updated on

Here are a few examples I found on the “great internet” of how SQL can be used to create some festive Christmas cheer and fun. See links to the original posts. Most of the examples shown below have been run on Oracle 21c Docker image, or on SQL Server or MySQL.

Our first example comes from Gerald Venzi who posted this on twitter. See later in the post for Christmas trees created using similar SQL queries.

WITH tree(lev, xmas) AS (
   SELECT 1 lev, RPAD(' ', 10, ' ') || '*' xmas
   FROM dual
   UNION ALL
   SELECT tree.lev+1, 
          RPAD(' ', 10-tree.lev, ' ') || 
             RPAD('^', tree.lev+1, '^') || 
             LPAD('^', tree.lev, '^')  xmas
   FROM tree
   WHERE tree.lev < 10
)
SELECT '    Merry Christmas!' AS "Merry Christmas!" FROM dual
UNION ALL
SELECT xmas FROM TREE
UNION ALL
SELECT '         | |' FROM dual
UNION ALL
SELECT '      ~~/   \~~' FROM dual;

Our next example includes using Spatial Data on SQL Server to create a Christmas Tree. This example comes from Niket Kedia.

USE tempdb
GO

— Create a table

CREATE TABLE #xmasTREE (shape GEOMETRY )

–Creating the Christmas tree with stars

INSERT INTO #xmasTREE

VALUES
(‘POLYGON((4 0, 0 0, 4 2, 1 2, 4 4, 1 4, 4 6, 2 6, 5 10, 8 6, 6 6, 9 4, 6 4, 9 2, 6 2, 10 0, 4 0))’ ),
(‘POLYGON((3.5 0, 4 -1, 6 -1, 6.5 0, 3.5 0))’ ),
(‘POLYGON((5 9.5, 4.5 9.25, 4.6 9.9, 4.1 10.2, 4.8 10.2, 5 10.9, 5.2 10.2, 5.9 10.2, 5.4 9.9, 5.5 9.25, 5 9.5))’ ),
(‘POLYGON((2 5.5, 1.5 5.25, 1.6 5.9, 1.1 6.2, 1.8 6.2, 2 6.9, 2.2 6.2, 2.9 6.2, 2.4 5.9, 2.5 5.25, 2 5.5))’ ),
(‘POLYGON((8 5.5, 7.5 5.25, 7.6 5.9, 7.1 6.2, 7.8 6.2, 8 6.9, 8.2 6.2, 8.9 6.2, 8.4 5.9, 8.5 5.25, 8 5.5))’ ),
(‘POLYGON((1 3.5, 0.5 3.25, 0.6 3.9, 0.1 4.2, 0.8 4.2, 1 4.9, 1.2 4.2, 1.9 4.2, 1.4 3.9, 1.5 3.25, 1 3.5))’ ),
(‘POLYGON((9 3.5, 8.5 3.25, 8.6 3.9, 8.1 4.2, 8.8 4.2, 9 4.9, 9.2 4.2, 9.9 4.2, 9.4 3.9, 9.5 3.25, 9 3.5))’ ), (‘POLYGON((1 1.5, 0.5 1.25, 0.6 1.9, 0.1 2.2, 0.8 2.2, 1 2.9, 1.2 2.2, 1.9 2.2, 1.4 1.9, 1.5 1.25, 1 1.5))’ ), (‘POLYGON((9 1.5, 8.5 1.25, 8.6 1.9, 8.1 2.2, 8.8 2.2, 9 2.9, 9.2 2.2, 9.9 2.2, 9.4 1.9, 9.5 1.25, 9 1.5))’ ),
(‘POLYGON((0 -0.5, -0.5 -0.75, -0.4 -0.1, -0.9 0.2, -0.2 0.2, 0 0.9, 0.2 0.2, 0.9 0.2, 0.4 -0.1, 0.5 -0.75, 0 -0.5))’ ),
(‘POLYGON((10 -0.5, 9.5 -0.75, 9.6 -0.1, 9.1 0.2, 9.8 0.2, 10 0.9, 10.2 0.2, 10.9 0.2, 10.4 -0.1, 10.5 -0.75, 10 -0.5))’ ),
(‘POLYGON((5 -2, 4.5 -2, 4.5 -1, 5 -1, 5.5 -1, 5.5 -2, 5 -2))’)

–Create the “Merry Christmas” greetings

INSERT INTO #xmasTREE

VALUES (‘POLYGON((-2 11, -2 12, -1.75 12, -1.5 11.5, -1.25 12, -1 12, -1 11, -1.25 11, -1.25 11.7, -1.5 11.2, -1.75 11.7, -1.75 11, -2 11))’ ),–M
(‘POLYGON((-1 11, -1 12, 0 12, 0 11.8, -0.75 11.8, -0.75 11.6, -0.25 11.6, -0.25 11.4, -0.75 11.4, -0.75 11.2, 0 11.2, 0 11, -1 11))’ ),–E
(‘POLYGON((0 11, 0 12, 1 12, 1 11.5, 0.4 11.5, 1 11, 0.7 11, 0.2 11.4, 0.2 11, 0 11),(0.2 11.8, 0.8 11.8, 0.8 11.7, 0.2 11.7, 0.2 11.8))’ ),–R
(‘POLYGON((1 11, 1 12, 2 12, 2 11.5, 1.4 11.5, 2 11, 1.7 11, 1.2 11.4, 1.2 11, 1 11),(1.2 11.8, 1.8 11.8, 1.8 11.7, 1.2 11.7, 1.2 11.8))’ ),–R
(‘POLYGON((2 12, 2.2 12, 2.5 11.6, 2.8 12, 3 12, 2.6 11.5, 2.6 11, 2.4 11, 2.4 11.5, 2 12))’ ), –Y
(‘POLYGON((4 11, 4 12, 5 12, 5 11.8, 4.25 11.8, 4.25 11.2, 5 11.2, 5 11, 4 11))’ ),–C
(‘POLYGON((5 11, 5 12, 5.2 12, 5.2 11.6, 5.8 11.6, 5.8 12, 6 12, 6 11, 5.8 11, 5.8 11.4, 5.2 11.4, 5.2 11, 5 11))’ ),–H
(‘POLYGON((6 11, 6 12, 7 12, 7 11.5, 6.4 11.5, 7 11, 6.7 11, 6.2 11.4, 6.2 11, 6 11),(6.2 11.8, 6.8 11.8, 6.8 11.7, 6.2 11.7, 6.2 11.8))’ ),–R
(‘POLYGON((7.2 11, 7.2 11.2, 7.4 11.2, 7.4 11.8, 7.2 11.8, 7.2 12, 7.8 12, 7.8 11.8, 7.6 11.8, 7.6 11.2, 7.8 11.2, 7.8 11, 7.2 11))’ ),–I
(‘POLYGON((8 11, 8 11.2, 8.8 11.2, 8.8 11.4, 8 11.4, 8 12, 9 12, 9 11.8, 8.2 11.8, 8.2 11.6, 9 11.6, 9 11, 8 11))’ ),–S
(‘POLYGON((9 11.8, 9 12, 10 12, 10 11.8, 9.6 11.8, 9.6 11, 9.4 11, 9.4 11.8, 9 11.8))’ ),–T
(‘POLYGON((10 11, 10 12, 10.25 12, 10.5 11.5, 10.75 12, 11 12, 11 11, 10.75 11, 10.75 11.7, 10.5 11.2, 10.25 11.7, 10.25 11, 10 11))’ ),–M
(‘POLYGON((11 11, 11 12, 12 12, 12 11, 11.75 11, 11.75 11.3, 11.25 11.3, 11.25 11, 11 11),(11.25 11.5, 11.25 11.8, 11.75 11.8, 11.75 11.5, 11.25 11.5))’ ),–A
(‘POLYGON((12 11, 12 11.2, 12.8 11.2, 12.8 11.4, 12 11.4, 12 12, 13 12, 13 11.8, 12.2 11.8, 12.2 11.6, 13 11.6, 13 11, 12 11))’ )–S

–Decorate the tree with some round bell circles

DECLARE @counter INT = 0

,@x INT

,@y INT ;

WHILE ( @counter < 25 )

BEGIN

INSERT INTO #xmasTREE

VALUES (GEOMETRY::Point(RAND() * 5 + 2.5, RAND() * 8.5, 0).STBuffer(0.3) )

SET @counter+=1 ;

END

Select * from #xmasTREE
Drop table #xmasTREE

Our next example comes from StackOverflow with a similar example for MySQL.

DECLARE @g TABLE (g GEOMETRY, ID INT IDENTITY(1,1));



-- Adjust Color

INSERT INTO @g(g) SELECT TOP 29 CAST('POLYGON((0 0, 0 0.0000001, 0.0000001 0.0000001, 0 0))' as geometry) FROM sys.messages;

-- Build Christmas Tree

INSERT INTO @g(g) VALUES (CAST('POLYGON((0 0,900 0,450 400, 0 0 ))' as geometry).STUnion(CAST('POLYGON((80 330,820 330,450 640,80 330 ))' as geometry)).STUnion(CAST('POLYGON((210 590,690 590,450 800, 210 590 ))' as geometry)));

-- Adjust Color

INSERT INTO @g(g) SELECT TOP 294 CAST('POLYGON((0 0, 0 0.0000001, 0.0000001 0.0000001, 0 0))' as geometry) FROM sys.messages;

-- Build a Star

INSERT INTO @g(g) VALUES (CAST('POLYGON ((450 910, 465.716 861.631, 516.574 861.631, 475.429 831.738, 491.145 783.369, 450 813.262, 408.855 783.369, 424.571 831.738, 383.426 861.631, 434.284 861.631, 450 910))' as geometry));

-- Build Colored Balls

INSERT INTO @g(g) SELECT TOP 2 CAST('POLYGON((0 0, 0 0.0000001, 0.0000001 0.0000001, 0 0))' as geometry) FROM sys.messages;

INSERT INTO @g(g) VALUES (CAST('CURVEPOLYGON (CIRCULARSTRING (80 290, 110 320, 140 290, 110 260, 80 290))' as geometry));

INSERT INTO @g(g) SELECT TOP 2 CAST('POLYGON((0 0, 0 0.0000001, 0.0000001 0.0000001, 0 0))' as geometry) FROM sys.messages;

INSERT INTO @g(g) VALUES (CAST('CURVEPOLYGON (CIRCULARSTRING (760 290, 790 320, 820 290, 790 260, 760 290))' as geometry));

INSERT INTO @g(g) SELECT TOP 3 CAST('POLYGON((0 0, 0 0.0000001, 0.0000001 0.0000001, 0 0))' as geometry) FROM sys.messages;

INSERT INTO @g(g) VALUES (CAST('CURVEPOLYGON (CIRCULARSTRING (210 550, 240 580, 270 550, 240 520, 210 550))' as geometry));

INSERT INTO @g(g) SELECT TOP 46 CAST('POLYGON((0 0, 0 0.0000001, 0.0000001 0.0000001, 0 0))' as geometry) FROM sys.messages;

INSERT INTO @g(g) VALUES (CAST('CURVEPOLYGON (CIRCULARSTRING (630 550, 660 580, 690 550, 660 520, 630 550))' as geometry));



SELECT g FROM @g ORDER BY ID;

GO

Connor McDonold posted the following SQL to create a Christmas Tree on StackOverflow in 2020, and wrote a blog post for it in December 2021. I just made one very very minor change to it.

You need to be careful where you run this. It runs best on/in a Linux environment, docker, VM, etc using SQL Command Line or SQL*Plus. For me, SQL Developer struggled to present the results correctly.

select replace(replace(replace(r,'X',chr(27)||'[42m'||chr(27)||'[1;'||to_char(32)||'m'||'X'||chr(27)||'[0m'),
    'T',chr(27)||'[43m'||chr(27)||'[1;'||to_char(33)||'m'||'T'||chr(27)||'[0m'),
    '@',chr(27)||'[33m'||chr(27)||'[1;'||to_char(31)||'m'||'@'||chr(27)||'[0m')  Happy_Christmas
from ( select lpad(' ',20-e-i)|| case when dbms_random.value < 0.3 then substr(s,1,e*2-3+i*2) 
       else substr(substr(s,1,dbms_random.value(1,e*2-3+i*2-1))||'@'||s,1,e*2-3+i*2) end r
from ( select rpad('X',40,'X') s,rpad('T',40,'T') t from dual ) , 
( select level i, level+2 hop from dual connect by level <= 4 ) , lateral
( select level e from dual connect by level <= hop ) union all select lpad(' ',17)||substr(t,1,3)
from ( select rpad('X',40,'X') s,rpad('T',40,'T') t from dual ) connect by level <= 5 );

Next up we have a simpler Christmas Tree. This comes from Matheus Boesing and his original post on grepora.

clear screen
set feedback off;
set heading off;
set pages 80;
SELECT DECODE(SIGN(FLOOR(maxwidth / 2) - ROWNUM),
              1,
              LPAD(' ', FLOOR(maxwidth / 2) - (ROWNUM - 1)) ||
              RPAD('*', 2 * (ROWNUM - 1) + 1, ' *'),
              LPAD('* * *', FLOOR(maxwidth / 2) + 3))
  FROM all_objects, (SELECT 40 AS maxwidth FROM DUAL)
 WHERE ROWNUM < FLOOR(maxwidth / 2) + 5
union all select '     Happy Christmas from Brendan!' from dual;
set heading on;
set feedback on;

This next example comes from LearnSQL and is similar to the previous example, but this time we get a multiple trees.

clear screen
set feedback off;
set heading off;
set pages 80;
WITH small_tree(tree_depth,pine) AS (
  SELECT 1 tree_depth,
  rpad(' ',10,' ') || '*'
  || rpad(' ',20,' ') || '*'
  || rpad(' ',20,' ') || '*'
  pine
  FROM   dual
  UNION ALL
  SELECT small_tree.tree_depth +1 tree_depth,
  rpad(' ',10-small_tree.tree_depth,' ') || rpad('*',small_tree.tree_depth+1,'.') || lpad('*',small_tree.tree_depth,'.')
  || rpad(' ',20-small_tree.tree_depth-tree_depth,' ') || rpad('*',small_tree.tree_depth+1,'.') || lpad('*',small_tree.tree_depth,'.')
  || rpad(' ',20-small_tree.tree_depth-tree_depth,' ') || rpad('*',small_tree.tree_depth+1,'.') || lpad('*',small_tree.tree_depth,'.') pine
  FROM   small_tree
  where small_tree.tree_depth < 10
)
SELECT   rpad(' ',9,' ') ||'Ho'
         || rpad(' ',19,' ') || 'Ho'
         || rpad(' ',19,' ') || 'Ho'
  pine
  FROM   dual
  UNION ALL
SELECT pine
FROM small_tree;
set heading on;
set feedback on;

Hans Viehmann from the Oracle Spatial teams sent me this example using Oracle Spatial and Oracle Spatial Studio. The geospatial data is defined using GeoJSON. The funny coordinates are referencing the Santa Claus village near Rovaniemi in Finnish Lappland, right on the Arctic Circle. Oracle Spatial Studio can be used to view the Christmas tree on a map (see image below).

DROP TABLE XMAS_TREE_JSON;

DROP TABLE XMAS_TREE;

CREATE TABLE XMAS_TREE_JSON (
    ID   NUMBER(10),
    DATA CLOB,
    CONSTRAINT XMAS_TREE_PK PRIMARY KEY ( ID ),
    CONSTRAINT XMAS_TREE_JSON_CHK CHECK ( DATA IS JSON )
);

INSERT INTO XMAS_TREE_JSON VALUES (
    1,
    '{
      "type": "Feature",
      "properties": { "label": "Tree"},
      "geometry": {
        "type": "Polygon",
        "coordinates": [
          [[25.84725335240364,
              66.5437744044363],
            [25.847166180610653,
              66.543721555766],
            [25.847235918045044,
              66.5437231572425],
            [25.84712728857994,
              66.5436740452493],
            [25.84722116589546,
              66.54367564672889],
            [25.847095102071762,
              66.54362012871027],
            [25.847205072641373,
              66.54362226402098],
            [25.847202390432358,
              66.54361105363778],
            [25.847297608852386,
              66.54361212129352],
            [25.847297608852386,
              66.5436238655039],
            [25.84740623831749,
              66.5436243993315],
            [25.84728017449379,
              66.54367724820834],
            [25.84736466407776,
              66.54367724820834],
            [25.847273468971252,
              66.54372369106797],
            [25.847321748733517,
              66.54372369106797],
            [25.84725335240364,
              66.5437744044363]
          ]
        ]
      }
    }'
);

COMMIT;

CREATE TABLE XMAS_TREE
    AS
        SELECT
            ID,
            JSON_VALUE(DATA, '$.geometry' RETURNING SDO_GEOMETRY) AS SHAPE,
            JSON_VALUE(DATA, '$.properties.label')                AS LABEL
        FROM
            XMAS_TREE_JSON;

Happy Christmas everyone.

AI Categories in EU AI Regulations

Posted on Updated on

The EU AI Regulations aims to provide a framework for addressing obligations for the use of AI applications in EU. These applications can be created, operated by or procured by companies both inside the EU and outside the EU, on data/people within the EU. In a previous post I get a fuller outline of the EU AI Regulations.

In this post I will look at proposed categorisation of AI applications, what type of applications fall into each category and what potential impact this may have on the operators of the AI application. The following diagram illustrates the categories detailed in the EU AI Regulations. These will be detailed below.

Let’s have a closer look at each of these categories

Unacceptable Risk (Red section)

The proposed legislation sets out a regulatory structure that bans some uses of AI, heavily regulates high-risk uses and lightly regulates less risky AI systems. The regulations intends to prohibit certain uses of AI which are deemed to be unacceptable because of the risks they pose. These would include deploying subliminal techniques or exploit vulnerabilities of specific groups of persons due to their age or disability, in order to materially distort a person’s behavior in a manner that causes physical or psychological harm; Lead to ‘social scoring’ by public authorities; Conduct ‘real time’ biometric identification in publicly available spaces. A more detailed version of this is:

  • Designed or used in a manner that manipulates human behavior, opinions or decisions through choice architectures or other elements of user interfaces, causing a person to behave, form an opinion or take a decision to their detriment. 
  • Designed or used in a manner that exploits information or prediction about a person or group of persons in order to target their vulnerabilities or special circumstances, causing a person to behave, form an opinion or take a decision to their detriment. 
  • Indiscriminate surveillance applied in a generalised manner to all natural persons without differentiation. The methods of surveillance may include large scale use of AI systems for monitoring or tracking of natural persons through direct interception or gaining access to communication, location, meta data or other personal data collected in digital and/or physical environments or through automated aggregation and analysis of such data from various sources. 
  • General purpose social scoring of natural persons, including online. General purpose social scoring consists in the large scale evaluation or classification of the trustworthiness of natural persons [over certain period of time] based on their social behavior in multiple contexts and/or known or predicted personality characteristics, with the social score leading to detrimental treatment to natural person or groups. 

There are some exemptions to these when such practices are authorised by law and are carried out [by public authorities or on behalf of public 25 authorities] in order to safeguard public security and are subject to appropriate safeguards for the rights and freedoms of third parties in compliance with Union law. 

High Risk (Orange section)

AI systems identified as high-risk include AI technology used in:

  • Critical infrastructures (e.g. transport), that could put the life and health of citizens at risk; 
  • Educational or vocational training, that may determine the access to education and professional course of someone’s life (e.g. scoring of exams); 
  • Safety components of products (e.g. AI application in robot-assisted surgery);
  • Employment, workers management and access to self-employment (e.g. CV-sorting software for recruitment procedures);
  • Essential private and public services (e.g. credit scoring denying citizens opportunity to obtain a loan); 
  • Law enforcement that may interfere with people’s fundamental rights (e.g. evaluation of the reliability of evidence);
  • Migration, asylum and border control management (e.g. verification of authenticity of travel documents);
  • Administration of justice and democratic processes (e.g. applying the law to a concrete set of facts).

All High risk AI applications will be subject to strict obligations before they can be put on the market: 

  • Adequate risk assessment and mitigation systems;
  • High quality of the datasets feeding the system to minimise risks and discriminatory outcomes; 
  • Logging of activity to ensure traceability of results
  • Detailed documentation providing all information necessary on the system and its purpose for authorities to assess its compliance; 
  • Clear and adequate information to the user; 
  • Appropriate human oversight measures to minimise risk; 
  • High level of robustness, security and accuracy.

These can also be categorised as (i) Risk management; (ii) Data governance; (iii) Technical documentation; (iv) Record keeping (traceability); (v) Transparency and provision of information to users; (vi) Human oversight; (vii) Accuracy; (viii) Cybersecurity robustness.

There will be some exceptions to this when the AI application is required by governmental and law enforcement agencies in certain circumstances.

Limited Risk (Yellow section)

“non-high-risk” AI systems should be encouraged to develop codes of conduct intended to foster the voluntary application of the mandatory requirements applicable to high-risk AI systems.

AI application within this Limited Risk category pose a limited risk, transparency requirements are imposed. For example, AI systems which are intended to interact with natural persons must be designed and developed in such a way that users are informed they are interacting with an AI system, unless it is “obvious from the circumstances and the context of use.”

Minimal Risk (Green section)

The Minimal Risk category a allows for all other AI systems can be developed and used in the EU without additional legal obligations than existing legislation For example, AI-enabled video games or spam filters. Some discussion suggest the vast majority of AI systems currently used in the EU fall into this category, where they represent minimal or no risk.