Running Oracle Database on Docker on Apple M1 Chip
This post is for you if you have an Apple M1 laptop and cannot get Oracle Database to run on Docker.
The reason Oracle Database, and lots of other software, doesn’t run on the new Apple Silicon is their new chip uses a different instructions set to what is used by Intel chips. Most of the Database vendors have come out to say they will not be porting their Databases to the M1 chip, as most/all servers out there run on x86 chips, and the cost of porting is just not worth it, as there is zero customers.
Are you using an x86 Chip computer (Windows or Macs with intel chips)? If so, follow these instructions (and ignore this post)
If you have been using Apple for your laptop for some time and have recently upgraded, you are now using the M1 chip, you have probably found some of your software doesn’t run. In my scenario (and with many other people) you can no longer run an Oracle Database 😦
But there does seem to be a possible solution and this has been highlighted by Tom de Vroomen on his blog. A work around is to spin up an x86 container using Colima. Tom has given some instructions on his blog, and what I list below is an extended set of instructions to get fully setup and running with Oracle on Docker on M1 chip.
You might have Homebrew installed, but if not run the following to install.
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
You can not install Colima using Homebrew. This might take a minute or two to run.
brew install colima
3-Start colima x86 container
With Colima installed, we can now start an x86 container.
colima start --arch x86_64 --memory 4
The container will be based on x86, which is the important part of what we need. The memory is 4GB, but you can probably drop that a little.
The above command should start within a second or two.
4-Install Oracle Database for Docker
The following command will create an Oracle Database docker image using the image created by Gerald Venzi.
docker run -d -p 1521:1521 -e ORACLE_PASSWORD=<your password> -v oracle-volume:/opt/oracle/oradata gvenzl/oracle-xe
I changed <your password> to SysPassword1.
This will create the docker image and will allow for any changes to the database to be persisted after you shutdown docker. This is what you want to happen.
5-Log-in to Oracle as System
Open the docker client to see if the Oracle Database image is running. If not click on the run button.
When it finishes starting up, open the command line (see icon to the left of the run button), and log in as the SYSTEM user.
You are now running Oracle Database on Docker on an M1 chip laptop 🙂
6-Create new user
You shouldn’t use the System user, as that is like using root for everything. You’ll need to create a new user/schema in the database for you to use for your work. Run the following.
create user brendan identified by BTPassword1 default tablespace users
grant connect, resource to brendan;
If these run without any errors you now have your own schema in the Oracle Database on Docker (on M1 chip)
7-Connect using SQL*Plus & SQL Developer
Now let’s connect to the schema using sqlplus.
That should work for you and you can now proceed using the command line tool.
If you refer to use a GUI tool then go install SQL Developer. Jeff Smith has a blog post about installing SQL Developer on M1 chip. Here is the connection screen with all the connection details entered (using the username and password given/used above)
You can now use the command line as well as SQL Developer to connect to your Oracle Database (on docker on M1).
8-Stop Docker and Colima
After you have finished using the Oracle Database on Docker you will want to shut it down until the next time you want to use it. There are two steps to follow. The first is to stop the Docker image. Just go to the Docker Desktop and click on the Stop button. It might take a few seconds for it to shutdown.
The second thing you need to do is to stop Colima.
That’s it all done.
9-What you need to run the next time (and every time after that)
For the second and subsequent time you want to use the Oracle Docker image all you need to do is the following
(a) Start Colima
colima start --arch x86_64 --memory 4
(b) Start Oracle on Docker
Open Docker Desktop and click on the Run button [see Docker Desktop image above]
And to stop everything
(a) Stop the Oracle Database on Docker Desktop
(b) Stop Colima by running ‘colima stop’ in a terminal
Oracle Database In-Memory – simple example
In a previous post, I showed how to enable and increase the memory allocation for use by Oracle In-Memory. That example was based on using the Pre-built VM supplied by Oracle.
To use In-Memory on your objects, you have a few options.
Enabling the In-Memory attribute on the EXAMPLE tablespace by specifying the INMEMORY attribute
SQL> ALTER TABLESPACE example INMEMORY;
Enabling the In-Memory attribute on the sales table but excluding the “prod_id” column
SQL> ALTER TABLE sales INMEMORY NO INMEMORY(prod_id);
Disabling the In-Memory attribute on one partition of the sales table by specifying the NO INMEMORY clause
SQL> ALTER TABLE sales MODIFY PARTITION SALES_Q1_1998 NO INMEMORY;
Enabling the In-Memory attribute on the customers table with a priority level of critical
SQL> ALTER TABLE customers INMEMORY PRIORITY CRITICAL;
You can also specify the priority level, which helps to prioritise the order the objects are loaded into memory.
A simple example to illustrate the effect of using In-Memory versus not.
Create a table with, say, 11K records. It doesn’t really matter what columns and data are.
Now select all the records and display the explain plan
select count(*) from test_inmemory;
Now, move the table to In-Memory and rerun your query.
alter table test_inmemory inmemory PRIORITY critical; select count(*) from test_inmemory; -- again
There you go!
We can check to see what object are In-Memory by
SELECT table_name, inmemory, inmemory_priority, inmemory_distribute, inmemory_compression, inmemory_duplicate FROM user_tables WHERE inmemory = 'ENABLED’ ORDER BY table_name;
To remove the object from In-Memory
SQL > alter table test_inmemory no inmemory; -- remove the table from in-memory
This is just a simple test and lots of other things can be done to improve performance
But, you do need to be careful about using In-Memory. It does have some limitations and scenarios where it doesn’t work so well. So care is needed
CAO Points 2022 – Grade inflation, deflation or in-line
Last week I wrote a blog post analysing the Leaving Cert results over the past 3-8 years. Part of that post also looked at the claim from the Dept of Education saying the results in 2022 would be “in-line on aggregate” with the results from 2021. The outcome of the analysis was grade deflation was very evident in many subjects, but when analysed and profiled at a very high level, they did look similar.
I didn’t go into how that might impact on the CAO (Central Applications Office) Points. If there was deflation in some of the core and most popular subjects, then you might conclude there could be some changes in the profile of CAO Points being awarded, and that in turn would have a small change on the CAO Points needed for a lot of University courses. But not all of them, as we saw last week, the increased number of students who get grades in the H4-H7 range. This could mean a small decrease in points for courses in the 520+ range, and a small increase in points needed in the 300-500-ish range.
The CAO have published the number of students of each 10 point range. I’ve compared the 2022 data, with each year going back to 2015. The following table is a high level summary of the results in 50 point ranges.
An initial look at these numbers and percentages might look like points are similar to last year and even 2020. But for 2015-2019 the similarity is closer. Again looking back at the previous blog post, we can see the results profiles for 20215-2019 are broadly similar and does indicate some normalisation might have been happening each year. The following chart illustrate the percentage of students who achieved points in each range.
From the above we can see the profile is similar across 2015-2019, although there does seem to be a flattening of the curve between 2015-2016!
Let’s now have a look at 2019 (the last pre-coivd year), 2021 and 2022. This will allow use to compare the “inflated” years to the last “normal” year.
This chart clearly shows a shifting of the profile to the left for the red line which represents 2022. This also supports my blog post last week, and that the Dept of Education has started the process of deflating marks.
Based on this shifting/deflating of marks, we could see the grade/CAO Points profiles reverting back to almost 2019 profile by 2025. For students sitting the Leaving Cert in 2023, there will be another shift to the left, and with another similar shift in 2024. In 2024, the students will be the last group to sit the Leaving Cert who were badly affected during the Covid years. Many of them lost large chunks on school and many didn’t sit the Junior Cert. I’d predict 2025 will see the first time the marks/points profiles will match pre-covid years.
For this analysis I’ve used a variety of tools including Excel, Python and Oracle Analytics.
The Dataset used can be found under Dataset menu, and listed as ‘CAO Points Profiles 2015-2022’. Also, check out the Leaving Certificate 2015-2022 dataset.
Leaving Cert 2022 Results – Inflation, deflation or in-line!
The Leaving Certificate 2022 results are out. Up and down the country there are people who are delighted with their results, while others are disappointed, and lots of other emotions.
The Leaving Certificate is the terminal examination for secondary education in Ireland, with most students being examined in seven subjects, with their best six grades counting towards their “points”, which in turn determines what university course they might get. Check out this link for learn more about the Leaving Certificate.
The Dept of Education has been saying, for several months, this results this year (2022) will be “in-line on aggregate” with the results from 2021. There has been some concerns about grade inflation in 2021 and the impact it will have on the students in 2022 and future years. At some point the Dept of Education needs to address this grade inflation and look to revert back to the normal profile of grades pre-Covid.
Let’s have a look to see if this is true, and if it is true when we look a little deeper. Do the aggregate results hide grade deflation in some subjects.
For the analysis presented in this blog post, I’ve just looked at results at Higher Level across all subjects, and for the deeper dive I’ll look at some of the most popular subjects.
Firstly let’s have a quick look at the distribution of grades by subject for 2022 and 2021.
Remember the Dept of Education said the 2022 results should be in-line with the results of 2021. This required them to apply some adjustments, after marking the exam scripts, to give an updated profile. The following chart shows this comparison between the two years. On initial inspection we can see it is broadly similar. This is good, right? It kind of is and at a high level things look broadly in-line and maybe we can believe the Dept of Education. Looking a little closer we can see a small decrease in the H2-H4 range, and a slight increase in the H5-H8.
Let’s dive a little deeper. When we look at the grade profile of students in 2021 and 2022, How many subjects increased the number of students at each grade vs How many subjects decreased grades vs How many approximately stated the same. The table below shows the results and only counts a change if it is greater than 1% (to allow for minor variations between years).
This table in very interesting in that more subjects decreased their H1s, with some variation for the H2-H4s, while for the lower range of H5-H7 we can see there has been an increase in grades. If I increased the margin to 3% we get a slightly different results, but only minor changes.
“in-line on aggregate” might be holding true, although it appears a slight increase on the numbers getting the lower grades. This might indicate either more of an adjustment to weaker students and/or a bit of a down shifting of grades from the H2-H4 range. But at the higher end, more subjects reduced than increase. The overall (aggregate) numbers are potentially masking movements in grade profiles.
Let’s now have a look at some of the core subjects of English, Irish and Mathematics.
For English, it looks like they fitted to the curve perfectly! keeping grades in-line between the two years. Mathematics is a little different with a slight increase in grades. But when you look at Irish we can see there was definite grade deflation. For each of these subjects, the chart on the left contains four years of data including 2019 when the last “normal” leaving certificate occurred. With Irish the grade profile has been adjusted (deflated) significantly and is closer to 2019 profile than it is to 2021. There was been lots and lots of discussions nationally about how and when grades will revert to normal profile. The 2022 profile for Irish seems to show this has started to happen in this subject, which raises the question if this is occurring in any other subjects, and is hidden/masked by the “in-line on aggregate” figures.
This blog post would become just too long if I was to present the results profile for each of the 42+ subjects.
Let’s have a look as two of the most common foreign languages, French and Spanish.
Again we can see some grade deflation, although not to be same extent as Irish. For both French and Spanish, we have reduced numbers for the H2-H4 range and a slight increase for H5-H7, and shift to the left in the profile. A slight exception is for those getting a H1 for both subjects. The adjustment in the results profile is more pronounced for French, and could indicate some deflation adjustments.
Next we’ll look at some of the science subjects of Physics, Chemistry and Biology.
These three subjects also indicate some adjusts back towards the pre-Covid profile, with exception of H1 grades. We can see the 2022 profile almost reflect the 2019 profile (excluding H1s) and for Biology appears to be at a half way point between 2019 and 2022 (excluding H1s)
Just one more example of grade deflation, and this with Design, Communication and Graphics (or DCG)
Yes there is obvious grade deflation and almost back to 2019 profile, with the exception of H1s again.
I’ve mentioned some possible grade deflation in various subjects, but there are also subjects where the profile very closely matches the 2021 profile. We have seen above English is one of those. Others include Technology, Art and Computer Science.
I’ve analyzed many more subjects and similar shifting of the profile is evident in those. Has the Dept of Education and State Examinations Commission taken steps to start deflating grades from the highs of 2021? I’d said the answer lies in the data, and the data I’ve looked at shows they have started the deflation process. This might take another couple of years to work out of the system and we will be back to “normal” pre-covid profiles. Which raises another interesting question, Was the grade profile for subjects, pre-covid, fitted to the curve? For the core set of subjects and for many of the more popular subjects, the data seems to indicate this. Maybe the “normal” distribution of marks is down to the “normal” distribution of abilities of the student population each year, or have grades been normalised in some way each year, for years, even decades?
For this analysis I’ve used a variety of tools including Excel, Python and Oracle Analytics.
The Dataset used can be found under Dataset menu, and listed as ‘Leaving Certificate 2015-2022’. An additional Dataset, I’ll be adding soon, will be for CAO Points Profiles 2015-2022.
Changing In-Memory size in Oracle Database
The pre-built virtual machine provided by Oracle for trying out and playing with Oracle Database comes configured to use the In-Memory option. But memory size is a little limited if you are trying to load anything slightly bigger than a tiny table into memory, for example if the table has more than a few hundred rows.
The amount of memory allocated to In-Memory can be increased to allow for more data to be loaded. There is a requirement that the VM and Database has enough memory allocated to allow this. If you don’t and increase the In-Memory size too large, you will have some problems restarting the database and VM. So proceed carefully.
For the pre-built VM, I typically allocate 4G or 8G of RAM to the VM. This in turn will give more memory to the database when it starts.
To setup In-Memory on the VM run the following:
– Open a terminal window and run this command:
sqlplus sys/oracle as sysdba
Then run these two commands
alter session set container = cdb$root; alter system set inmemory_size = 200M scope=spfile;
Now, bounce the VM, i.e. restart the VM
In-memory will now be enabled on your Database, and you can now create/move tables in and out of in-memory.
How many Data Center Regions by Vendor?
There has been some discussions over the past weeks, months, years on which Cloud provider is the best, or the biggest, or provides the most services, or [insert some other topic]? The old answer to everything related to IT is ‘It Depends’. A recent article by CloudWars (and updated numbers by them) and some of the comments to it, and elsewhere prompted me to have a look at ‘How Many Data Center Regions do each Cloud Vendor have?’ I didn’t go looking at all possible cloud vendors, but instead kept to the main vendors consisting of Microsoft Azure, Google Cloud Platform (GCP), Oracle Cloud and Amazon Web Services (AWS). We know AWS has been around for a long long time, and seems to gather most of the attention and focus within the developer community, etc, you’d expect them to be the biggest. Well, the results from my investigation does not support this.
Now, it is important to remember when reading the results presented below that these are from a particular point in time, and that is the date of this blog post. If you are reading this some time later, the actual number of data centers will be different and will be larger.
When looking at the data, as presented on each vendors website (see link to each vendor below), most list some locations coming in the future. It’s really impressive to see the number of “coming soon” locations. These “coming soon” locations are not included below (as of blog post date).
Before showing a breakdown for each vendor the following table gives the total number of data center regions for each vendor.
The numbers presented in the above table are different to does presented in the original CloudWars article or their updated numbers. If you look at the comments on that article and the comments on LinkedIn, you will see there was some disagreement of on their numbers. The problem is a data quality one, and vendors presenting their list of data centers in different parts of their website and documentation. Data quality and consistency is always a challenge, and particularly so when publishing data on vendor blogs, documentation and various websites. Indeed, the data I present in this post will be out of date within a few days/weeks. I’ve also excluded locations marked as ‘coming soon’ (see Azure listing).
Looking at the numbers in the above table can be a little surprising, particularly if you look at AWS, and then look at the difference in numbers between AWS and Azure and even Oracle. Very soon Azure will have double the number of data center regions when compared to AWS.
What do these numbers tell you? Based on just these numbers it would appear that Azure and Oracle Cloud are BIG cloud providers, and are much bigger than AWS. But maybe AWS has data centers that are way way bigger than those two vendors. It can be a little challenging to know the size and scale of each data center. Maybe they are going after different types of customers? With the roll out of Cloud over the past few years, there has been numerous challenges from legal and sovereign related issues requiring data to be geographically located within a country or geographic region. Most of these restrictions apply to larger organizations in the financial, insurance, and government related, etc. Given the historical customer base of Microsoft and Oracle, maybe this is driving their number of data center regions.
In more recent times there has been a growing interest, and in some sectors a growing need for organizations to be multi-cloud. Given the number of data center regions, for Azure and Oracle, and commonality in their geographic locations, it isn’t surprising to see the recent announcement from Azure and Oracle of their interconnect agreement and making the Oracle Database Service available (via interconnect) from Azure. I’m sure we will see more services being shared between these two vendors, and other might join in doing something similar.
Let’s get back to the numbers and data for each Vendor. I’ve also included a link to the Vendor website where these data was obtained. (just remember these are based on date of blog post)
When you look at the Azure website listing the location, at first look it might appear they have many more locations. When you look closer at these, some/many of them are listed as ‘coming soon’. These ‘coming soon’ locations are not included in the above and below tables.
GCP doesn’t list and Government data center regions.
When OCI doesn’t know who you are
When you are logged into your Oracle Cloud account and they give you a link to follow, which should bring you to another page in your account, but it doesn’t. The link (which is automatically generated by OCI) is formed incorrectly and gives you something like the following.
Have a look at the address bar. You will see a part saying
/users/undefined. That’s the problem, the link was not defined or created correctly. Although you are logged into your account, in theory, things like this should work correctly and from talking to other people about their OCI accounts, they don’t have the same problem, as it just works as expected.
What can you do to work around this? The first thing you need to do is locate your user OCID. This is located on some Services pages in OCI. Alternatively, go to Users in the Identity & Security menu section.
Now go back to the page/link which gives the error, as shown in the first image, locate the undefined work in the address bar, and replace it with the OCID for the user. The link will look something like this (only a subset of link is shown.
The page should now load without any errors.
What’s causing this error? That’s a good question and the true answer to it is unknown (at this point in time). But from some investigation and comparing my OCI account with other people there does seem to be some anomalies with my OCI user accounts and syncing of these between OCI classic and the current version. My OCI account is missing a federated account. I’m not sure if this is the exact difference but it does seem to be a missing element when compared to other people’s accounts. Why has this happened to me? Well that is something for the OCI teams who looks after setting up accounts to look into. Maybe there are others out there.
In the mean time, if you have encountered the same problem as me, the fix/solution outlined above should work for you.
oracledb Python Library – Connect to DB & a few other changes
Oracle have released a new Python library for connecting to Oracle Databases on-premises and on the Cloud. It’s called (very imaginatively, yet very clearly)
oracledb. This new Python library replaces the previous library called
cx_Oracle. Just consider
cx_oracle as obsolete, and use
oracledb going forward, as all development work on new features and enhancements will be done to
cx_oracle has been around a long time, and it’s about time we have a new and enhanced library that is more flexible and will suit many different deployment scenarios. The previous library (
cx_Oracle) was great, but it did require additional software installation with Oracle Client, and some OS environment settings, which at times took a bit of debugging. This makes it difficult/challenging to deploy in different environments, for example IOTs, CI/CD, containers, etc. Deployment environments have changed and the new
oracledb library makes it simpler.
To check out the following links for a full list of new features and other details.
Home page: oracle.github.io/python-oracledb
Installation instructions: python-oracledb.readthedocs.io/en/latest/installation.html
One of the main differences between the two libraries is how you connect to the Database. With
oracledb you need to use named the parameters, and the new library uses a thin connection. If you need the thick connection you can switch to that easily enough.
The following examples will illustrate how to connect to Oracle Database (local and cloud ADW/ATP) and how these are different to using the cx_Oracle library (which needed Oracle Client software installed). Remember the new
oracledb library does not need Oracle Client.
To get started, install
pip3 install oracledb
Local Database (running in Docker)
To test connection to a local Database I’m using a Docker image of 21c (hence localhost in this example, replace with IP address for your database). Using the previous library (
cx_Oracle) you could concatenate the connection details to form a string and pass that to the connection. With
oracledb, you need to use named parameters and specify each part of the connection separately.
This example illustrates this simple connection and prints out some useful information about the connection, do we have a healthy connection, are we using thing database connection and what version is the connection library.
p_username = "..." p_password = "..." p_dns = "localhost/XEPDB1" p_port = "1521" con = oracledb.connect(user=p_username, password=p_password, dsn=p_dns, port=p_port) print(con.is_healthy()) print(con.thin) print(con.version) --- True True 126.96.36.199.0
Having created the connection we can now query the Database and close the connection.
cur = con.cursor() cur.execute('select table_name from user_tables') for row in cur: print(row) --- ('WHISKIES_DATASET',) ('HOLIDAY',) ('STAGE',) ('DIRECTIONS',) --- cur.close() con.close()
The code I’ve given above is simple and straight forward. And if you are converting from
cx_Oracle, you will probably have minimal changes as you probably had your parameter keywords defined in your code. If not, some simple editing is needed.
To simplify the above code even more, the following does all the same steps without the explicit open and close statements, as these are implicit in this example.
import oracledb con = oracledb.connect(user=p_username, password=p_password, dsn=p_dns, port=p_port) with con.cursor() as cursor: for row in cursor.execute('select table_name from user_tables'): print(row)
(Basic) Oracle Cloud – Autonomous Database, ATP/ADW
Everyone is using the Cloud, Right? If you believe the marketing they are, but in reality most will be working in some hybrid world using a mixture of on-premises and cloud storage. The example given in the previous section illustrated connecting to a local/on-premises database. Let’s now look at connecting to a database on Oracle Cloud (Autonomous Database, ATP/ADW).
oracledb library things have been simplified a little. In this section I’ll illustrate a simple connection to a ATP/ADW using a thin connection.
What you need is the location of the directory containing the unzipped wallet file. No Oracle client is needed. If you haven’t downloaded a Wallet file in a while, you should go download a new version of it. The Wallet will contain a pem file which is needed to securely connect to the DB. You’ll also need the password for the Wallet, so talk nicely with your DBA. When setting up the connection you need to provide the directory for the tnsnames.ora file and the ewallet.pem file. If you have downloaded and unzipped the Wallet, these will be in the same directory
import oracledb p_username = "..." p_password = "..." p_walletpass = '...' #This time we specify the location of the wallet con = oracledb.connect(user=p_username, password=p_password, dsn="student_high", config_dir="/Users/brendan.tierney/Dropbox/5-Database-Wallets/Wallet_student-Full", wallet_location="/Users/brendan.tierney/Dropbox/5-Database-Wallets/Wallet_student-Full", wallet_password=p_walletpass) print(con) con.close()
This method allows you to easily connect to any Oracle Cloud Database.
(Thick Connection) Oracle Cloud – Autonomous Database, ATP/ADW
If you have Oracle Client already installed and set up, and you want to use a thick connection, you will need to initialize the function
import oracledb p_username = "..." p_password = "..." #point to directory containing tnsnames.ora oracledb.init_oracle_client(config_dir="/Applications/instantclient_19_8/network/admin") con = oracledb.connect(user=p_username, password=p_password, dsn="student_high") print(con) con.close()
Warning: Some care is needed with using init_oracle_client. If you use it once in your Python code or App then all connections will use it. You might need to do a code review to look at when this is needed and if not remove all occurrences of it from your Python code.
(Additional Security) Oracle Cloud – Autonomous Database, ATP/ADW
There are a few other additional ways of connecting to a database, but one of my favorite ways to connect involves some additional security, particularly when working with IOT devices, or in scenarios that additional security is needed. Two of these involve using One-way TLS and Mututal TLS connections. The following gives an example of setting up One-Way TLS. This involves setting up the Database to only received data and connections from one particular device via an IP address. This requires you to know the IP address of the device you are using and running the code to connect to the ATP/ADW Database.
To set this up, go to the ATP/ADW details in Oracle Cloud, edit the Access Control List, add the IP address of the client device, disable mutual TLS and download the DB Connection. The following code gives and example of setting up a connection
import oracledb p_username = "..." p_password = "..." adw_dsn = '''(description= (retry_count=20)(retry_delay=3)(address=(protocol=tcps)(port=1522) (host=adb.us-ashburn-1.oraclecloud.com))(connect_data=(service_name=a8rk428ojzuffy_student_high.adb.oraclecloud.com)) (security=(ssl_server_cert_dn="CN=adwc.uscom-east-1.oraclecloud.com,OU=Oracle BMCS US,O=Oracle Corporation,L=Redwood City,ST=California,C=US")))''' con4 = oracledb.connect(user=p_username, password=p_password, dsn=adw_dsn)
This sets up a secure connection between the client device and the Database.
From my initial testing of existing code/applications (although no formal test cases) it does appear the new
oracledb library is processing the queries and data quicker than
cx_Oracle. This is good and hopefully we will see more improvements with speed in later releases.
Also don’t forget the impact of changing the buffer size for your database connection. This can have a dramatic effect on speeding up your database interactions. Check out this post which illustrates this.
Oracle OCI AI Services
Oracle Cloud have been introducing new AI Services over the past few months, and we see a few more appearing over the coming few months. When you look at the list you might be a little surprised that these are newly available cloud services from Oracle. You might be surprised for two main reasons. Firstly, AWS and Google have similar cloud services available for some time (many years) now, and secondly, Oracle started talking about having these cloud services many years ago. It has taken some time for these to become publicly available. Although some of these have been included in other applications and offerings from Oracle, so probably they were busy with those before making them available as stand alone services.
These can be located in your Oracle Cloud account from the hamburger menu, as shown below
As you can see most of these AI Services are listed, except for the OCI Forecasting, which is due to be added “soon”. We can also expect to have an OCI Translation services and possibly some additional ones.
- OCI Language: This services can work with over 75 languages and allows you to detect and perform knowledge extraction from the text to include entity identification and labelling, classification of text into more than 600 categories, sentiment analysis and key phrase extraction. This can be used automate knolwedge extraction from customer feedback, product reviews, discussion forums, legal documents, etc
- OCI Speech: Performs Speech to Text, from live streaming of speech, audio and video recordings, etc creating a transcription. It works across English, Spanish and Portuguese, with other languages to be added. A nice little feature includes Profanity filtering, allowing you to tag, remove or mask certain words
- OCI Vision: This has two parts. The first is for processing documents, and is slightly different to OCI Language Service, in that this service looks at processing text documents in jpeg, pdf, png and tiff formats. Text information extraction is performed identifying keep terms, tables, meta-data extraction, table extraction etc. The second part of OCI Vision deals with image analysis and extracting key information from the image such as objects, people, text, image classification, scene detection, etc. You can use either the pretrained models or include your own models.
- OCI Anomaly Detection: Although anomaly detection is available via algorithms in the Database and OCI Data Science offerings, this new services allow for someone with little programming experience to utilise an ensemble of models, including the MSET algorithm, to provide greater accuracy with identifying unusual patterns in the data.
Note: I’ve excluded some services from the above list as these have been available for some time now or have limited AI features included in them. These include OCI Data Labelling, OCI Digital Assistant.
Some of these AI Services, based on the initial release, have limited functionality and resources, but this will change over time.
Python Data Profiling libraries
One of the most common, and sometimes boring, task when working with datasets is writing some code to profile the data. Most data scientists will have built a set of tools/scripts to help them with this regular and slightly boring task. As with most IT tasks we should be trying to automate what we can, to allow us to spend more time on more important tasks, such as deriving insights and delivering value to the business, instead of repeatedly writing code to produce various statistics about the data and drawing pretty pictures.
I’ve written previously about automating and using some data profiling libraries to help us with this task. There are lots of packages available on pypi.og and on GitHub. Below I give examples of 5 Python Data Profiling libraries, with links to their GitHubs.
This is probably one of the better and more popular Python libraries for exploring data. The aim is to make it as simple as possible using one line of code.
import pandas_profiling as pp df2.profile_report()
Following the line line of code approach skimpy is a light weight tool that provides summary statistics about variables in data frames. They like to thing skimpy is a super-charged version of
df.describe(). Skimpy also has some automated data cleaning functions.
from skimpy import skim skim(df)
Dataprep has multiple features with the two main features being EDA (Exploratory Data Analysis) and Data Cleaning. For EDA functionality, it is build to scale for larger data sets and provides some interactive charts.
from dataprep.eda import * from dataprep.datasets import load_dataset from dataprep.eda import plot, plot_correlation, plot_missing, plot_diff, create_report df = load_dataset("titanic") plot(df) plot_missing(df) plot_missing(df, "Age")
Sweetviz creates high-density visualizations to help kickstart EDA with just two lines of code. Output is a fully self-contained HTML application.
import pandas as pd import sweetviz as sv df = pd.read_csv('../input/titanic/train.csv') report = sweetviz.analyze(df, "Survived")
Autoviz works on visualizing the relationship of the data, it can find the most impactful features and plot creative visualization.
from autoviz.AutoViz_Class import AutoViz_Class AV = AutoViz_Class() df = AV.AutoViz('titanic_train.csv')
Always try to automate the boring tasks, and using one of these packages is a step towards doing for for any Data Analysts, Data Sciences, Data Engineers, Machine Learning Engineer, AI Engineer, etc.
NATO AI Strategy
Over the past 18 months there has been wide spread push buy many countries and geographic regions, to examine how the creation and use of Artificial Intelligence (AI) can be regulated. I’ve written many blog posts about these. But it isn’t just government or political alliances that are doing this, other types of organisations are also doing so.
NATO, the political and (mainly) military alliance, has also joined the club. They have release a summary version of their AI Strategy. This might seem a little strange for this type of organisation to do something like this. But if you look a little closer NATA also says they work together in other areas such as Standardisation Agreements, Crisis Management, Disarmament, Energy Security, Clime/Environment Change, Gender and Human Security, Science and Technology.
In October/November 2021, NATO formally adopted their Artificial Intelligence (AI) Strategy (for defence). Their AI Strategy outlines how AI can be applied to defence and security in a protected and ethical way (interesting wording). Their aim is to position NATO as a leader of AI adoption, and it provides a common policy basis to support the adoption of AI System sin order to achieve the Alliances three core tasks of Collective Defence, Crisis Management and Cooperative Security. An important element of the AI Strategy is to ensure inter-operability and standardisation. This is a little bit more interesting and perhaps has a lessor focus on ethical use.
NATO’s AI Strategy contains the following principles of Responsible use of AI (in defence):
- Lawfulness: AI applications will be developed and used in accordance with national and international law, including international humanitarian law and human rights law, as applicable.
- Responsibility and Accountability: AI applications will be developed and used with appropriate levels of judgment and care; clear human responsibility shall apply in order to ensure accountability.
- Explainability and Traceability: AI applications will be appropriately understandable and transparent, including through the use of review methodologies, sources, and procedures. This includes verification, assessment and validation mechanisms at either a NATO and/or national level.
- Reliability: AI applications will have explicit, well-defined use cases. The safety, security, and robustness of such capabilities will be subject to testing and assurance within those use cases across their entire life cycle, including through established NATO and/or national certification procedures.
- Governability: AI applications will be developed and used according to their intended functions and will allow for: appropriate human-machine interaction; the ability to detect and avoid unintended consequences; and the ability to take steps, such as disengagement or deactivation of systems, when such systems demonstrate unintended behaviour.
- Bias Mitigation: Proactive steps will be taken to minimise any unintended bias in the development and use of AI applications and in data sets.
By acting collectively members of NATO will ensure a continued focus on interoperability and the development of common standards.
Some points of interest:
- Bias Mitigation efforts will be adopted with the aim of minimising discrimination against traits such as gender, ethnicity or personal attributes. However, the strategy does not say how bias will be tackled – which requires structural changes which go well beyond the use of appropriate training data.
- The strategy also recognises that in due course AI technologies are likely to become widely available, and may be put to malicious uses by both state and non-state actors. NATO’s strategy states that the alliance will aim to identify and safeguard against the threats from malicious use of AI, although again no detail is given on how this will be done.
- Running through the strategy is the idea of interoperability – the desire for different systems to be able to work with each other across NATO’s different forces and nations without any restrictions.
- What about Autonomous weapon systems? Some members do not support a ban on this technology.
- Has similar wording to the principles adopted by the US Department of Defense for the ethical use of AI.
- Wants to make defence and security a more attractive to private sector and academic AI developers/researchers.
- NATO principles have no coherent means of implementation or enforcement.
AI Sandboxes – EU AI Regulations
The EU AI Regulations provides a framework for placing on the market and putting into service AI system in the EU. One of the biggest challenges most organisations will face will be how they can innovate and develop new AI systems while at the same time ensuring they are compliant with the regulations. But a what point do you know you are compliant with these new AI Systems? This can be challenging and could limit or slow down the development and deployment of such systems.
The EU does not want to limit or slow down such innovations and want organisations to continually research, develop and deploy new AI. To facilitate this the EU AI Regulations contains a structure under which this can be achieved.
Section or Title of EU AI Regulations contains Articles 53, 54, and 55 to support the development of new AI systems by the use of Sandboxes. We have already seen examples of these being introduced by the UK and Norwegian Data Protection Commissioners.
A Sandbox “provides a controlled environment that facilitates the development, testing and validation of innovative AI systems for a limited time before their placement on the market or putting into
service pursuant to a specific plan.“
Sandboxes are stand alone environments to allow the exploration and development of new AI solutions, which may or may not include some risky use of customer data or other potential AI outcomes which may not be allowed under the regulations. It becomes a controlled experiment lab for the AI team who are developing and testing a potential AI System and can do so under real world conditions. The Sandbox gives a “safe” environment for this experimental work.
The Sandbox are to be established by the Competent Authorities in each EU country. In Ireland the Competent Authority seems to be the Data Protection Commissioner, and this may be similar in other countries. As you can imagine, under the current wording of the EU AI Regulations this might present some challenges for the both the Competent Authority and also for the company looking to develop an AI solution. Firstly, does the Competent Authority need to provide sandboxes for all companies looking to develop AI, and each one of these companies may have several AI projects. This is a massive overhead for the Competent Authority to provide and resource. Secondly, will companies be willing to setup a self-contained environment, containing customer data, data insights, solutions with potential competitive advantage, etc in a Sandbox provided by the Competent Authority. The technical infrastructure used could be hosting many Sandboxes, with many competing companies using the same infrastructure at the same time. This is a big ask for the companies and the Competent Authority.
Let’s see what really happens regarding the implementation of the Sandboxes over the coming years, and how this will be defined in the final draft of the Regulations.
Article 54 defines additional requirements for the processing of personal data within the Sandbox.
- Personal Data being used is required, and can be fulfilled by processing anonymized, synthetic or other non-personal data. Even if it has been collected for other purposes.
- Continuous monitoring needed to identify any high risk to fundamental rights of the data subject, and response mechanism to mitigate those risks.
- Any personal data to be processed is in a functionally separate, isolated and protected data processing environment under the control of the participants and only authorised persons have access to that data.
- Any personal data processed are not be transmitted, transferred or otherwise accessed by other parties.
- Any processing of personal data does not lead to measures or decisions affecting the data subjects.
- All personal data is deleted once the participation in the sandbox is terminated or the personal data has reached the end of its retention period.
- Full documentation of what was done to the data, must be kept for 1 year after termination of Sandbox, and only to be used for accountability and documentation obligations.
- Documentation of the complete process and rationale behind the training, testing and validation of AI, along with test results as part of technical documentation. (see Annex IV)
- Short Summary of AI project, its objectives and expected results published on website of Competent Authorities
Based on the last bullet point the Competent Authority is required to write am annual report and submit this report to the EU AI Board. The report is to include details on the results of their scheme, good and bad practices, lessons learnt and recommendations on the setup and application of the Regulations within the Sandboxes.
- ← Previous
- Next →
You must be logged in to post a comment.