After a very, very, very long wait (18+ months) Oracle OCI Data Science platform is now available.
But before you jump straight into using OCI Data Science, there is a little bit of setup required for your Cloud Tenancy. There is the easy simple approach and then there is the slightly more involved approach. These are
- Simple approach. Assuming you are just going to use the root tenancy and compartment, you just need to setup a new policy to enable the use of the OCI Data Science services. This assuming you have your VNC configuration complete with NAT etc. This can be done by creating a policy with the following policy statement. After creating this you can proceed with creating your first notebook in OCI Data Science.
allow service datascience to use virtual-network-family in tenancy
- Slightly more complicated approach. When you get into having a team based approach you will need to create some additional Oracle Cloud components to manage them and what resources are allocated to them. This involved creating Compartments, allocating users, VNCs, Policies etc. The following instructions brings you through these steps
IMPORTANT: After creating a Compartment or some of the other things listed below, and they are not displayed in the expected drop-down lists etc, then either refresh your screen or log-out and log back in again!
1. Create a Group for your Data Science Team & Add Users
The first step involves creating a Group to ‘group’ the various users who will be using the OCI Data Science services.
Go to Governance and Administration ->Identity and click on Groups.
Enter some basic descriptive information. I called my Group, ‘my-data-scientists’.
Now click on your Group in the list of Groups and add the users to the group.
You may need to create the accounts for the various users.
2. Create a Compartment for your Data Science work
Now create a new Compartment to own the network resources and the Data Science resources.
Go to Governance and Administration ->Identity and click on Compartments.
Enter some basic descriptive information. I’ve called my compartment, ‘My-DS-Compartment’.
3. Create Network for your Data Science work
Creating and setting up the VNC can be a little bit of fun. You can do it the manual way whereby you setup and configure everything. Or you can use the wizard to do this. I;m going to show the wizard approach below.
But the first thing you need to do is to select the Compartment the VNC will belong to. Select this from the drop-down list on the left hand side of the Virtual Cloud Network page. If your compartment is not listed, then log-out and log-in!
To use the wizard approach click the Networking QuickStart button.
Select the option ‘VCN with Internet Connectivity and click Start Workflow, as you will want to connect to it and to allow the service to connect to other cloud services.
I called my VNC ‘My-DS-vnc’ and took the default settings. Then click the Next button.
The next screen shows a summary of what will be done. Click the Create button, and all of these networking components will be created.
All done with creating the VNC.
4. Create required Policies enable OCI Data Science for your Compartment
There are three policies needed to allocated the necessary resources to the various components we have just created. To create these go to Governance and Administration ->Identity and click on Policies.
Select your Compartment from the drop-down list. This should be ‘My-DS-Compartment’, then click on Create Policy.
The first policy allocates a group to a compartment for the Data Science services. I called this policy, ‘DS-Manage-Access’.
allow group My-data-scientists to manage data-science-family in compartment My-DS-Compartment
The next policy is to give the Data Science users access to the network resources. I called this policy, ‘DS-Manage-Network’.
allow group My-data-scientists to use virtual-network-family in compartment My-DS-Compartment
And the third policy is to give Data Science service access to the network resources. I called this policy, ‘DS-Network-Access’.
allow service datascience to use virtual-network-family in compartment My-DS-Compartment
Job Done 🙂
You are now setup to run the OCI Data Science service. Check out my Blog Post on creating your first OCI Data Science Notebook and exploring what is available in this Notebook.
There are a number of different tools and languages available for machine learning projects. One such tool is Oracle Analytics Cloud (OAC). Check out my article for Oracle Magazine that takes you through the steps of using OAC to create a Machine Learning workflow/dataflow.
Oracle Analytics Cloud provides a single unified solution for analyzing data and delivering analytics solutions to businesses. Additionally, it provides functionality for processing data, allowing for data transformations, data cleaning, and data integration. Oracle Analytics Cloud also enables you to build a machine learning workflow, from loading, cleaning, and transforming data and creating a machine learning model to evaluating the model and applying it to new data—without the need to write a line of code. My Oracle Magazine article takes you through the various tasks for using Oracle Analytics Cloud to build a machine learning workflow.
That article covers the various steps with creating a machine learning model. This post will bring you through the steps of using that model to score/label new data.
In the Data Flows screen (accessed via Data->Data Flows) click on Create. We are going to create a new Data Flow to process the scoring/labeling of new data.
Select Data Flow from the pop-up menu. The ‘Add Data Set’ window will open listing your available data sets. In my example, I’m going to use the same data set that I used in the Oracle Magazine article to build the model. Click on the data set and then click on the Add button.
The initial Data Flow will be created with the node for the Data Set. The screen will display all the attributes for the data set and from this you can select what attributes to include or remove. For example, if you want a subset of the attributes to be used as input to the machine learning model, you can select these attributes at this stage. These can be adjusted at a later stages, but the data flow will need to be re-run to pick up these changes.
Next step is to create the Apply Model node. To add this to the data flow click on the small plus symbol to the right of the Data Node. This will pop open a window from which you will need to select the Apply Model.
A pop-up window will appear listing the various machine learning models that exist in your OAC environment. Select the model you want to use and click the Ok button.
The next node to add to the data flow is to save the results/outputs from the Apply Model node. Click on the small plus icon to the right of the Apply Model node and select Save Results from the popup window.
We now have a completed data flow. But before you finish edit the Save Data node to give a name for the Save Data Set, and you can edit what attributes/features you want in the result set.
You can now save and run the Data Flow, and view the outputs from applying the machine learning model. The saved data set results can be viewed in the Data menu.
When you comes to working with R to access and process your data there are a number of little features and behaviors you need to look out for.
One of these is the DATE datatype.
The main issue that you have to look for is the TIMEZONE conversion that happens then you extract the data from the database into your R environment.
There is a datatype conversions from the Oracle DATE into the POSIXct format. The POSIXct datatype also includes the timezone. But the Oracle DATE datatype does not have a Timezone part of it.
When you look into this a bit more you will see that the main issue is what Timezone your R session has. By default your R session will inherit the OS session timezone. For me here in Ireland we have the time timezone as the UK. You would time that the timezone would therefore be GMT. But this is not the case. What we have for timezone is BST (or British Standard Time) and this takes into account the day light savings time. So on the 26th May, BST is one hour ahead of GMT.
OK. Let’s have a look at a sample scenario.
As mentioned above, when I select date of type DATE from Oracle into R, using ROracle, I end up getting a different date value than what was in the database. Similarly when I process and store the data.
The following outlines the data setup and some of the R code that was used to generate the issue/problem.
Create a table that contains a DATE field and insert some records.
CREATE TABLE STAFF (STAFF_NUMBER VARCHAR2(20), FIRST_NAME VARCHAR2(20), SURNAME VARCHAR2(20), DOB DATE, PROG_CODE VARCHAR2(6 BYTE), PRIMARY KEY (STAFF_NUMBER)); insert into staff values (123456789, 'Brendan', 'Tierney', to_date('01/06/1975', 'DD/MM/YYYY'), 'DEPT_1'); insert into staff values (234567890, 'Sean', 'Reilly', to_date('21/10/1980', 'DD/MM/YYYY'), 'DEPT_2'); insert into staff values (345678901, 'John', 'Smith', to_date('12/03/1973', 'DD/MM/YYYY'), 'DEPT_3'); insert into staff values (456789012, 'Barry', 'Connolly', to_date('25/01/1970', 'DD/MM/YYYY'), 'DEPT_4');
You can query this data in SQL without any problems. As you can see there is no timezone element to these dates.
Selecting the data
I now establish my connection to my schema in my 12c database using ROracle. I won’t bore you with the details here of how to do it but check out point 3 on this post for some details.
When I select the data I get the following.
> res<-dbSendQuery(con, "select * from staff") > data <- fetch(res) > data$DOB  "1975-06-01 01:00:00 BST" "1980-10-21 01:00:00 BST" "1973-03-12 00:00:00 BST"  "1970-01-25 01:00:00 BST"
As you can see two things have happened to my date data when it has been extracted from Oracle. Firstly it has assigned a timezone to the data, even though there was no timezone part of the original data. Secondly it has performed some sort of timezone conversion to from GMT to BST. The difference between GMT and BTS is the day light savings time. Hence the 01:00:00 being added to the time element that was extract. This time should have been 00:00:00. You can see we have a mixture of times!
So there appears to be some difference between the R date or timezone to what is being used in Oracle.
To add to this problem I was playing around with some dates and different records. I kept on getting this scenario but I also got the following, where we have a mixture of GMT and BST times and timezones. I’m not sure why we would get this mixture.
> data$DOB  "1995-01-19 00:00:00 GMT" "1965-06-20 01:00:00 BST" "1973-10-20 01:00:00 BST"  "2000-12-28 00:00:00 GMT"
This is all a bit confusing and annoying. So let us look at how you can now fix this.
Fixing the problem : Setting Session variables
What you have to do to fix this and to ensure that there is consistency between that is in Oracle and what is read out and converted into R (POSIXct) format, you need to define two R session variables. These session variables are used to ensure the consistency in the date and time conversions.
These session variables are TZ for the R session timezone setting and Oracle ORA_SDTZ setting for specifying the timezone to be used for your Oracle connections.
The trick there is that these session variables need to be set before you create your ROracle connection. The following is the R code to set these session variables.
> Sys.setenv(TZ = "GMT") > Sys.setenv(ORA_SDTZ = "GMT")
So you really need to have some knowledge of what kind of Dates you are working with in the database and if a timezone if part of it or is important. Alternatively you could set the above variables to UDT.
Selecting the data (correctly this time)
Now when we select our data from our table in our schema we now get the following, after reconnecting or creating a new connection to your Oracle schema.
> data$DOB  "1975-06-01 GMT" "1980-10-21 GMT" "1973-03-12 GMT" "1970-01-25 GMT"
Now you can see we do not have any time element to the dates and this is correct in this example. So all is good.
We can now update the data and do whatever processing we want with the data in our R script.
But what happens when we save the data back to our Oracle schema. In the following R code we will add 2 days to the DOB attribute and then create a new table in our schema to save the updated data.
> data$DOB  "1975-06-01 GMT" "1980-10-21 GMT" "1973-03-12 GMT" "1970-01-25 GMT" > data$DOB <- data$DOB + days(2) > data$DOB  "1975-06-03 GMT" "1980-10-23 GMT" "1973-03-14 GMT" "1970-01-27 GMT"
> dbWriteTable(con, "STAFF_2", data, overwrite = TRUE, row.names = FALSE)  TRUE
I’ve used the R package Libridate to do the date and time processing.
When we look at this newly created table in our Oracle schema we will see that we don’t have DATA datatype for DOB, but instead it is created using a TIMESTAMP data type.
If you are working with TIMESTAMP etc type of data types (i.e. data types that have a timezone element that is part of it) then that is a slightly different problem.
When using Oracle Machine Learning notebooks, you can export and import these between different projects and different environments (from ADW to ATP).
But something to watch out for when you import a notebook into your ADW or ATP environment is to reset the Interpreter Bindings.
When you create a new OML Notebook and build it up, the various Interpreter Bindings are automatically set or turned on. But for Imported OML Notebooks they are not turned on.
I’m assuming this will be fixed at some future point.
If you import an OML Notebook and turn on the Interpreter Bindings you may find the code in your notebook cells running very slowly
To turn on these binding, click on the options icon as indicated by the red box in the following image.
You will get something like the following being displayed. None of the bindings are highlighted.
To enable the Interpreter Bindings just click on each of these boxes. When you do this each one will be highlighted and will turn a blue color.
All done! You can now run your OML Notebooks without any problems or delays.
When setting up a Database on Oracle Cloud Infrastructure (OCI) for the first time there are a few pre and post steps to complete before you can access the database using a JDBC type of connect, just like what you have in SQL Developer, or using Python or other similar tools and/or languages.
1. Setup Virtual Cloud Network (VCN)
The first step, when starting off with OCI, is to create a Virtual Cloud Network.
Create a VCN and take all the defaults. But change the radio button shown in the following image.
That’s it. We will come back to this later.
2. Create the Oracle Database
To create the database select ‘Bare Metal, VM and Exadata’ from the menu.
Click on the ‘Launch DB System’ button.
Fill in the details of the Database you want to create and select from the various options from the drop-downs.
Fill in the details of the VCN you created in the previous set, and give the name of the DB and the Admin password.
When you are finished everything that is needed, the ‘Launch DB System’ at the bottom of the page will be enabled. After clicking on this botton, the VM will be built and should be ready in a few minutes. When finished you should see something like this.
3. SSH to the Database server
When the DB VM has been created you can now SSH to it. You will need to use the SSH key file used when creating the DB VM. You will need to connect to the opc (operating system user), and from there sudo to the oracle user. For example
ssh -i <ssh file> opc@<public IP address>
The public IP address can be found with the Database VM details
[opc@tudublins1 ~]$ sudo su - oracle [oracle@tudublins1 ~]$ . oraenv ORACLE_SID = [cdb1] ? The Oracle base has been set to /u01/app/oracle [oracle@tudublins1 ~]$ [oracle@tudublins1 ~]$ sqlplus / as sysdba SQL*Plus: Release 22.214.171.124.0 - Production on Wed Mar 13 11:28:05 2019 Version 126.96.36.199.0 Copyright (c) 1982, 2018, Oracle. All rights reserved. Connected to: Oracle Database 18c Enterprise Edition Release 188.8.131.52.0 - Production Version 184.108.40.206.0 SQL> alter session set container = pdb1; Session altered. SQL> create user demo_user identified by DEMO_user123##; User created. SQL> grant create session to demo_user; Grant succeeded. SQL>
4. Open port 1521
To be able to access this with a Basic connection in SQL Developer and most programming languages, we will need to open port 1521 to allow these tools and languages to connect to the database.
To do this go back to the Virtual Cloud Networks section from the menu.
Click into your VCN, that you created earlier. You should see something like the following.
Click on the Security Lists, menu option on the left hand side.
From that screen, click on Default Security List, and then click on the ‘Edit All Rules’ button at the top of the next screen.
Add a new rule to have a ‘Destination Port Range’ set for 1521
5. Connect to the Database from anywhere
Now you can connect to the OCI Database using a basic SQL Developer Connection.
With the recent release of Oracle’s Autonomous Data Warehouse Cloud (ADWC), Oracle has given data scientists a new tool for data discovery and machine learning on the ADWC. Oracle Machine Learning is based on Apache Zeppelin and gives us a new machine learning tool for accessing the in-database machine learning algorithms and in-database statistical functions.
Oracle Machine Learning (OML) SQL notebooks provide easy access to Oracle’s parallelized, scalable in-database implementations of a library of Oracle Advanced Analytics’ machine learning algorithms (classification, regression, anomaly detection, clustering, associations, attribute importance, feature extraction, times series, etc.), SQL, PL/SQL and Oracle’s statistical and analytical SQL functions. Oracle Machine Learning SQL notebooks and Oracle Advanced Analytics’ library of machine learning SQL functions combined with PL/SQL allow companies to automate their discovery of new insights, generate predictions and add “AI” to data viz dashboards and enterprise applications.
The key features of Oracle Machine Learning include:
- Collaborative SQL notebook UI for data scientists
- Packaged with Oracle Autonomous Data Warehouse Cloud
- Easy access to shared notebooks, templates, permissions, scheduler, etc.
- Access to 30+ parallel, scalable in-database implementations of machine learning algorithms
- SQL and PL/SQL scripting language supported
- Enables and Supports Deployments of Enterprise Machine Learning Methodologies in ADWC
Here is a list of key resources for Oracle Machine Learning:
- Oracle Machine Learning Notebooks
- Video overview of Oracle Machine Learning
- Download sample Oracle Machine Learning notebooks
- Quick Start Tutorial for getting started with Oracle Machine Learning
- Documentation: Using Oracle Machine Learning
Last week I was presenting at Oracle Code in New York. I’ve presented at a few Oracle Code events over the past 12 months and it is always interesting to meet and talk with developers from around the World.
The title of my presentation this time was ‘SQL: The one language to rule all your data’.
I’ve given this presentation a few times at different events (POUG, OOW, Oracle Code). I take the contents of this presentation for granted and that most people know these things. But the opposite is true. Well a lot of people do know these things, but a magnitude more do not seem to know.
For example, at last weeks Oracle Code event, I had about 100 people in the room. I started out by asking the attendees ‘How many of you write SQL every day?’. About 90% put up their hand. Then a few minutes later after I start talking about various statistical functions in the database, I then ask them to ‘Count how many statistical functions they have used?’ I then asked them to raise their hands if they use over five statistical functions. About eight people put up their hands. Then I asked how many people use over ten functions. To my surprise only one (yes one) person put up their hand.
The first half of the presentation talks about statistical, analytical and machine learning in the database.
The second half covers some (not all) of the various data types and locations of data that can be accessed from the database.
The presentation then concludes with the title of the presentation about SQL being the one language to rule all your data.
Based on last weeks experience, it looks like a lot more people need to hear it !
Hopefully I’ll get the chance to share this presentation with other events and Oracle User Group conferences.
Two of the key take away messages are:
- Google makes us stupid
- We need to RTFM more often
Here is a link to the slides on SlideShare
And I recorded a short video about the presentation with Bob from OTN/ODC.
If you used other languages, including Oracle PL/SQL, more than likely you will have experienced having to play buffering the number of records that are returned from a cursor. Typically this is needed when you are processing more than a few hundred records. The default buffering size is relatively small and by increasing the size of the number of records to be buffered can dramatically improve the performance of your code.
As with all things in coding and IT, the phrase “It Depends” applies here and changing the buffering size may not be what you need and my not help you to gain optimal performance for your code.
There are lots and lots of examples of how to test this in PL/SQL and other languages, but what I’m going to show you here in this blog post is to change the buffering size when using Python to process data in an Oracle Database using the Oracle Python library cx_Oracle.
Let us begin with taking the defaults and seeing what happens. In this first scenario the default buffering is used. Here we execute a query and the process the records in a FOR loop (yes these is a row-by-row, slow-by-slow approach.
import time i = 0 # define a cursor to use with the connection cur2 = con.cursor() # execute a query returning the results to the cursor print("Starting cursor at", time.ctime()) cur2.execute('select * from sh.customers') print("Finished cursor at", time.ctime()) # for each row returned to the cursor, print the record print("Starting for loop", time.ctime()) t0 = time.time() for row in cur2: i = i+1 if (i%10000) == 0: print(i,"records processed", time.ctime()) t1 = time.time() print("Finished for loop at", time.ctime()) print("Number of records counted = ", i) ttime = t1 - t0 print("in ", ttime, "seconds.")
This gives us the following output.
Starting cursor at 10:11:43 Finished cursor at 10:11:43 Starting for loop 10:11:43 10000 records processed 10:11:49 20000 records processed 10:11:54 30000 records processed 10:11:59 40000 records processed 10:12:05 50000 records processed 10:12:09 Finished for loop at 10:12:11 Number of records counted = 55500 in 28.398550033569336 seconds.
Processing the data this way takes approx. 28 seconds and this corresponds to the buffering of approx 50-75 records at a time. This involves many, many, many round trips to the the database to retrieve this data. This default processing might be fine when our query is only retrieving a small number of records, but as our data set or results set from the query increases so does the time it takes to process the query.
But we have a simple way of reducing the time taken, as the number of records in our results set increases. We can do this by increasing the number of records that are buffered. This can be done by changing the size of the ‘arrysize’ for the cursor definition. This reduces the number of “roundtrips” made to the database, often reducing networks load and reducing the number of context switches on the database server.
The following gives an example of same code with one additional line.
cur2.arraysize = 500
Here is the full code example.
# Test : Change the arraysize and see what impact that has import time i = 0 # define a cursor to use with the connection cur2 = con.cursor() cur2.arraysize = 500 # execute a query returning the results to the cursor print("Starting cursor at", time.ctime()) cur2.execute('select * from sh.customers') print("Finished cursor at", time.ctime()) # for each row returned to the cursor, print the record print("Starting for loop", time.ctime()) t0 = time.time() for row in cur2: i = i+1 if (i%10000) == 0: print(i,"records processed", time.ctime()) t1 = time.time() print("Finished for loop at", time.ctime()) print("Number of records counted = ", i) ttime = t1 - t0 print("in ", ttime, "seconds.")
Now the response time to process all the records is.
Starting cursor at 10:13:02
Finished cursor at 10:13:02
Starting for loop 10:13:02
10000 records processed 10:13:04
20000 records processed 10:13:06
30000 records processed 10:13:08
40000 records processed 10:13:10
50000 records processed 10:13:12
Finished for loop at 10:13:13
Number of records counted = 55500
in 11.780734777450562 seconds.
All done in just under 12 seconds, compared to 28 seconds previously.
Here is another alternative way of processing the data and retrieves the entire results set, using the ‘fetchall’ command, and stores it located in ‘res’.
The 18c Oracle DBaaS is now available. This is the only place that Oracle 18c will be available until later in 2018. So if you want to try it out, then you are going to need to get some Oracle Cloud credits, or you may already have a paying account for Oracle Cloud.
The following outlines the steps you need to go through to gets Oracle 18c setup.
1. Log into your Oracle Cloud
Log into your Oracle Cloud environment. Depending on your access path you will get to your dashboard.
Select Create Instance from the dashboard.
2. Create a new Database
From the list of services to create, select Database.
3. Click ‘Create Instance’
4. Enter the Database Instance details
Enter the details for your new Oracle 18c Database. I’ve called mine ‘db18c’.
Then for the Software Release dropdown list, select ‘Oracle Database 18c’.
Next select the Software Edition from the dropdown list.
5. Fill in the Instance Details
Fill in the details for ‘DB Name’, ‘PDB Name’, ‘Administration Password’, ‘Confirm Password’, setup the SSH Public Key, and then decide if you need the Backup and Recovery option.
6. Create the DBaaS
Double check everything and when ready click on the ‘Create’ button.
7. Wait for Everything to be Create
Now is the time to be patient and wait while your cloud service is created.
I’ve created two different version of the 18c Oracle DBaaS. The Enterprise Edition to 30 minutes to complete and the High Performance service too 47 minutes.
No it’s time to go play.
On Friday afternoon (16th February) we started to see tweets and blog posts from people in Oracle saying that Oracle 18c was now available. But is only available on Oracle Cloud and Engineered Systems.
It looks like we will have to wait until the Autumn before we can install it ourselves on our own servers 😦
Here is the link to the official announcement for Oracle 18c.
Oracle 18c is really Oracle 220.127.116.11. The next full new release of the Oracle database is expected to be Oracle 19.
The new features and incremental enhancements in Oracle 18c are:
- Memory Optimized Fetches
- Exadata RAC Optimizations
- High Availability
- Online Partition Merge
- Improved Machine Learning (OAA)
- Polymorphic Table Functions
- Spatial and Graph
- More JSON improvements
- Private Temporary Tablespaces
- New mode for Connection Manager
And now the all important links to the documentation.
To give Oracle 18c a try you will need to go to cloud.oracle.com and select Database from the drop down list from the Platform menu. Yes you are going to need an Oracle Cloud account and some money or some free credit. Go and get some free cloud credits at the upcoming Oracle Code events.
If you want a ‘free’ way of trying out Oracle 18c, you can use Oracle Live SQL. They have setup some examples of the new features for you to try.
NOTE: Oracle 18c is not Autonomous. Check out Tim Hall’s blog posts about this. The Autonomous Oracle Database is something different, and we will be hearing more about this going forward.
Is Python the new R?
Maybe, maybe not, but that I’m finding in recent months is more companies are asking me to use Python instead of R for some of my work.
In this blog post I will walk through the steps of setting up the Oracle driver for Python, called cx_Oracle. The documentation for this drive is good and detailed with plenty of examples available on GitHub. Hopefully there isn’t anything new in this post, but it is my experiences and what I did.
1. Install Oracle Client
The Python driver requires Oracle Client software to be installed. Go here, download and install. It’s a straightforward install. Make sure the directories are added to the search path.
2. Download and install cx_Oracle
You can use pip3 to do this.
pip3 install cx_Oracle Collecting cx_Oracle Downloading cx_Oracle-6.1.tar.gz (232kB) 100% |████████████████████████████████| 235kB 679kB/s Building wheels for collected packages: cx-Oracle Running setup.py bdist_wheel for cx-Oracle ... done Stored in directory: /Users/brendan.tierney/Library/Caches/pip/wheels/0d/c4/b5/5a4d976432f3b045c3f019cbf6b5ba202b1cc4a36406c6c453 Successfully built cx-Oracle Installing collected packages: cx-Oracle Successfully installed cx-Oracle-6.1
3. Create a connection in Python
Now we can create a connection. When you see some text enclosed in angled brackets <>, you will need to enter your detailed for your schema and database server.
# import the Oracle Python library import cx_Oracle # define the login details p_username = "" p_password = "" p_host = "" p_service = "" p_port = "1521" # create the connection con = cx_Oracle.connect(user=p_username, password=p_password, dsn=p_host+"/"+p_service+":"+p_port) # an alternative way to create the connection # con = cx_Oracle.connect('/@/:1521') # print some details about the connection and the library print("Database version:", con.version) print("Oracle Python version:", cx_Oracle.version) Database version: 18.104.22.168.0 Oracle Python version: 6.1
4. Query some data and return results to Python
In this example the query returns the list of tables in the schema.
# define a cursor to use with the connection cur = con.cursor() # execute a query returning the results to the cursor cur.execute('select table_name from user_tables') # for each row returned to the cursor, print the record for row in cur: print("Table: ", row) Table: ('DECISION_TREE_MODEL_SETTINGS',) Table: ('INSUR_CUST_LTV_SAMPLE',) Table: ('ODMR_CARS_DATA',)
Now list the Views available in the schema.
# define a second cursor cur2 = con.cursor() # return the list of Views in the schema to the cursor cur2.execute('select view_name from user_views') # display the list of Views for result_name in cur2: print("View: ", result_name) View: ('MINING_DATA_APPLY_V',) View: ('MINING_DATA_BUILD_V',) View: ('MINING_DATA_TEST_V',) View: ('MINING_DATA_TEXT_APPLY_V',) View: ('MINING_DATA_TEXT_BUILD_V',) View: ('MINING_DATA_TEXT_TEST_V',)
5. Query some data and return to a Panda in Python
Pandas are commonly used for storing, structuring and processing data in Python, using a data frame format. The following returns the results from a query and stores the results in a panda.
# in this example the results of a query are loaded into a Panda # load the pandas library import pandas as pd # execute the query and return results into the panda called df df = pd.read_sql_query("SELECT * from INSUR_CUST_LTV_SAMPLE", con) # print the records returned by query and stored in panda print(df.head()) CUSTOMER_ID LAST FIRST STATE REGION SEX PROFESSION \ 0 CU13388 LEIF ARNOLD MI Midwest M PROF-2 1 CU13386 ALVA VERNON OK Midwest M PROF-18 2 CU6607 HECTOR SUMMERS MI Midwest M Veterinarian 3 CU7331 PATRICK GARRETT CA West M PROF-46 4 CU2624 CAITLYN LOVE NY NorthEast F Clerical BUY_INSURANCE AGE HAS_CHILDREN ... MONTHLY_CHECKS_WRITTEN \ 0 No 70 0 ... 0 1 No 24 0 ... 9 2 No 30 1 ... 2 3 No 43 0 ... 4 4 No 27 1 ... 4 MORTGAGE_AMOUNT N_TRANS_ATM N_MORTGAGES N_TRANS_TELLER \ 0 0 3 0 0 1 3000 4 1 1 2 980 4 1 3 3 0 2 0 1 4 5000 4 1 2 CREDIT_CARD_LIMITS N_TRANS_KIOSK N_TRANS_WEB_BANK LTV LTV_BIN 0 2500 1 0 17621.00 MEDIUM 1 2500 1 450 22183.00 HIGH 2 500 1 250 18805.25 MEDIUM 3 800 1 0 22574.75 HIGH 4 3000 2 1500 17217.25 MEDIUM [5 rows x 31 columns]
6. Wrapping it up and closing things
Finally we need to wrap thing up and close our cursors and our connection to the database.
# close the cursors cur2.close() cur.close() # close the connection to the database con.close()
Watch out for more blog posts on using Python with Oracle, Oracle Data Mining and Oracle R Enterprise.
This week Oracle Code will be having an online event consisting of 5 tracks and with 3 presentations on each track.
This online Oracle Code event will be given in 3 different geographic regions on 12th, 13th and 14th December.
I’ve been selected to give one of these talks, and I’ve given this talk at some live Oracle Code events and at JavaOne back in October.
The present is pre-recorded and I recorded this video back in September.
I hope to be online at the end of some of these presentations to answer any questions, but unfortunately due to changes with my work commitments I may not be able to be online for all of them.
The moderator for these events will take your questions (or you can send them to me here) and I will write a blog post answering all your questions.