Oracle Database

Working with External Data on Oracle DB Docker

Posted on

With multi-modal databases (such as Oracle and many more) you will typically work with data in different formats and for different purposes. One such data format is with data located external to the database. The data will exist in files on the operating systems on the DB server or on some connected storage device.

The following demonstrates how to move data to an Oracle Database Docker image and access this data using External Tables. (This based on an example from Oracle-base.com with a few additional commands).

For this example, I’ll be using an Oracle 21c Docker image setup previously. Similarly the same steps can be followed for the 18c XE Docker image, by changing the Contain Id from 21cFull to 18XE.

Step 1 – Connect to OS in the Docker Container & Create Directory

The first step involves connecting the the OS of the container. As the container is setup for default user ‘oracle’, that is who we will connect as, and it is this Linux user who owns all the Oracle installation and associated files and directories

docker exec -it 21cFull /bin/bash

When connected we are in the Home directory for the Oracle user.

The Home directory contains lots of directories which contain all the files necessary for running the Oracle Database.

Next we need to create a directory which will story the files.

mkdir ext_data

As we are logged in as the oracle Linux user, we don’t have to make any permissions changes, as Oracle Database requires read and write access to this directory.

Step 3 – Upload files to Directory on Docker container

Open another terminal window on your computer (desktop/laptop). You should have two such terminal windows open. One you opened for Step 1 above, and this one. This will allow you to easily switch between files on your computer and the files in the Docker container.

Download the two Countries files, to your computer, which are listed on Oracle-base.com. Countries1.txt and Countries2.txt.

Now you need to upload those files to the Docker container.

docker cp Countries1.txt 21cFull:/opt/oracle/ext_data/Countries1.txt
docker cp Countries2.txt 21cFull:/opt/oracle/ext_data/Countries2.txt

Step 4 – Connect to System (DBA) schema, Create User, Create Directory, Grant access to Directory

If you a new to the Database container, you don’t have any general users/schemas created. You should create one, as you shouldn’t use the System (or DBA) user for any development work. To create a new database user connect to System.

sqlplus system/SysPassword1@//localhost/XEPDB1

To use sqlplus command line tool you will need to install Oracle Instant Client and then SQLPlus (which is a separate download from the same directory for your OS)

To create a new user/schema in the database you can run the following (change the username and password to something more sensible).

create user brendan identified by BtPassword1
default tablespace users
temporary tablespace temp;
grant connect, resource to brendan;
alter user brendan
quota unlimited on users;

Now create the Directory object in the database, which points to the directory on the Docker OS we created in the Step 1 above. Grant ‘brendan’ user/schema read and write access to this Directory

CREATE OR REPLACE DIRECTORY ext_tab_data AS '/opt/oracle/ext_data';
grant read, write on directory ext_tab_data to brendan;

Now, connect to the brendan user/schema.

Step 5 – Create external table and test

To connect to brendan user/schema, you can run the following if you are still using SQLPlus

SQL> connect brendan/BtPassword1@//localhost/XEPDB1

or if you exited it, just run this from the command line

sqlplus system/SysPassword1@//localhost/XEPDB1

Create the External Table (same code from oracle-base.com)

CREATE TABLE countries_ext (
  country_code      VARCHAR2(5),
  country_name      VARCHAR2(50),
  country_language  VARCHAR2(50)
)
ORGANIZATION EXTERNAL (
  TYPE ORACLE_LOADER
  DEFAULT DIRECTORY ext_tab_data
  ACCESS PARAMETERS (
    RECORDS DELIMITED BY NEWLINE
    FIELDS TERMINATED BY ','
    MISSING FIELD VALUES ARE NULL
    (
      country_code      CHAR(5),
      country_name      CHAR(50),
      country_language  CHAR(50)
    )
  )
  LOCATION ('Countries1.txt','Countries2.txt')
)
PARALLEL 5
REJECT LIMIT UNLIMITED;

It should create for you. If not and you get an error then if will be down to a typo on directory name or the files are not in the directory or something like that.

We can now query the External Table as if it is a Table in the database.

SQL> set linesize 120
SQL> select * from countries_ext order by country_name;
COUNT COUNTRY_NAME                         COUNTRY_LANGUAGE
----- ------------------------------------ ------------------------------
ENG   England                              English
FRA   France                               French
GER   Germany                              German
IRE   Ireland                              English
SCO   Scotland                             English
USA   Unites States of America             English
WAL   Wales                                Welsh

7 rows selected.

All done!

GoLang: Inserting records into Oracle Database using goracle

Posted on Updated on

In this blog post I’ll give some examples of how to process data for inserting into a table in an Oracle Database. I’ve had some previous blog posts on how to setup and connecting to an Oracle Database, and another on retrieving data from an Oracle Database and the importance of setting the Array Fetch Size.

When manipulating data the statements can be grouped (generally) into creating new data and updating existing data.

When working with this kind of processing we need to avoid the creation of the statements as a concatenation of strings. This opens the possibility of SQL injection, plus we are not allowing the optimizer in the database to do it’s thing. Prepared statements allows for the reuse of execution plans and this in turn can speed up our data processing and applications.

In a previous blog post I gave a simple example of a prepared statement for querying data and then using it to pass in different values as a parameter to this statement.

dbQuery, err := db.Prepare("select cust_first_name, cust_last_name, cust_city from sh.customers where cust_gender = :1")  
if err != nil {
    fmt.Println(err) 
    return  
}  
defer dbQuery.Close() 

rows, err := dbQuery.Query('M')  
if err != nil { 
    fmt.Println(".....Error processing query") 
    fmt.Println(err) 
    return  
}  
defer rows.Close()  

var CustFname, CustSname,CustCity string 
for rows.Next() {  
    rows.Scan(&CustFname, &CustSname, &CustCity) 
    fmt.Println(CustFname, CustSname, CustCity)  
}

For prepared statements for inserting data we can follow a similar structure. In the following example a table call LAST_CONTACT is used. This table has columns:

  • CUST_ID
  • CON_METHOD
  • CON_MESSAGE
_, err := db.Exec("insert into LAST_CONTACT(cust_id, con_method, con_message) VALUES(:1, :2, :3)", 1, "Phone", "First contact with customer")
if err != nil {
    fmt.Println(".....Error Inserting data") 
    fmt.Println(err) 
    return
}

an alternative is the following and allows us to get some additional information about what was done and the result from it. In this example we can get the number records processed.

stmt, err := db.Prepare("insert into LAST_CONTACT(cust_id, con_method, con_message) VALUES(:1, :2, :3)") 
if err != nil { 
    fmt.Println(err) 
    return 
}

res, err := dbQuery.Query(1, "Phone", "First contact with customer")  
if err != nil { 
    fmt.Println(".....Error Inserting data") 
    fmt.Println(err) 
    return  
} 

rowCnt := res.RowsAffected()
fmt.Println(rowCnt, " rows inserted.")

A similar approach can be taken for updating and deleting records

Managing Transactions

With transaction, a number of statements needs to be processed as a unit. For example, in double entry book keeping we have two inserts. One Credit insert and one debit insert. To do this we can define the start of a transaction using db.Begin() and the end of the transaction with a Commit(). Here is an example were we insert two contact details.

// start the transaction
transx, err := db.Begin()
if err != nil {
    fmt.Println(err) 
    return  
}

// Insert first record
_, err := db.Exec("insert into LAST_CONTACT(cust_id, con_method, con_message) VALUES(:1, :2, :3)", 1, "Email", "First Email with customer") 
if err != nil { 
    fmt.Println(".....Error Inserting data - first statement") 
    fmt.Println(err) 
    return 
}
// Insert second record
_, err := db.Exec("insert into LAST_CONTACT(cust_id, con_method, con_message) VALUES(:1, :2, :3)", 1, "In-Person", "First In-Person with customer") 
if err != nil { 
    fmt.Println(".....Error Inserting data - second statement") 
    fmt.Println(err) 
    return 
}

// complete the transaction
err = transx.Commit()
if err != nil {
    fmt.Println(".....Error Committing Transaction") 
    fmt.Println(err) 
    return 
}

 

GoLang: Querying records from Oracle Database using goracle

Posted on Updated on

Continuing my series of blog posts on using Go Lang with Oracle, in this blog I’ll look at how to setup a query, run the query and parse the query results. I’ll give some examples that include setting up the query as a prepared statement and how to run a query and retrieve the first record returned. Another version of this last example is a query that returns one row.

Check out my previous post on how to create a connection to an Oracle Database.

Let’s start with a simple example. This is the same example from the blog I’ve linked to above, with the Database connection code omitted.

    dbQuery := "select table_name from user_tables where table_name not like 'DM$%' and table_name not like 'ODMR$%'"
    rows, err := db.Query(dbQuery)
    if err != nil {
        fmt.Println(".....Error processing query")
        fmt.Println(err)
        return
    }
    defer rows.Close()

    fmt.Println("... Parsing query results") 
    var tableName string
    for rows.Next() {
        rows.Scan(&tableName)
        fmt.Println(tableName)
    }

Processing a query and it’s results involves a number of steps and these are:

  1. Using Query() function to send the query to the database. You could check for errors when processing each row
  2. Iterate over the rows using Next()
  3. Read the columns for each row into variables using Scan(). These need to be defined because Go is strongly typed.
  4. Close the query results using Close(). You might want to defer the use of this function but depends if the query will be reused. The result set will auto close the query after it reaches the last records (in the loop). The Close() is there just in case there is an error and cleanup is needed.

You should never use * as a wildcard in your queries. Always explicitly list the attributes you want returned and only list the attributes you want/need. Never list all attributes unless you are going to use all of them. There can be major query performance benefits with doing this.

Now let us have a look at using prepared statement. With these we can parameterize the query giving us greater flexibility and reuse of the statements. Additionally, these give use better query execution and performance when run the the database as the execution plans can be reused.

    dbQuery, err := db.Prepare("select cust_first_name, cust_last_name, cust_city from sh.customers where cust_gender = :1")
    if err != nil {
        fmt.Println(err) 
        return
    }
    defer dbQuery.Close()

    rows, err := dbQuery.Query('M')
    if err != nil {
        fmt.Println(".....Error processing query") 
        fmt.Println(err) 
        return
    }
    defer rows.Close()

    var CustFname, CustSname,CustCity string
    for rows.Next() {
        rows.Scan(&CustFname, &CustSname, &CustCity)   
        fmt.Println(CustFname, CustSname, CustCity) 
    }

Sometimes you may have queries that return only one row or you only want the first row returned by the query. In cases like this you can reduce the code to something like the following.

var CustFname, CustSname,CustCity string
err := db.Prepare("select cust_first_name, cust_last_name, cust_city from sh.customers where cust_gender = ?").Scan(&CustFname, &CustSname, &CustCity)  
if err != nil {
    fmt.Println(err) 
    return  
} 
fmt.Println(CustFname, CustSname, CustCity)

or an alternative to using Quer(), use QueryRow()

dbQuery, err := db.Prepare("select cust_first_name, cust_last_name, cust_city from sh.customers where cust_gender = ?")  
if err != nil {
    fmt.Println(err) 
    return  
}  
defer dbQuery.Close() 

var CustFname, CustSname,CustCity string
err := dbQuery.QueryRow('M').Scan(&CustFname, &CustSname, &CustCity)  
if err != nil { 
    fmt.Println(".....Error processing query") 
    fmt.Println(err) 
    return  
}  
fmt.Println(CustFname, CustSname, CustCity)

 

 

 

 

Importance of setting Fetched Rows size for Database Query using Golang

Posted on Updated on

When issuing queries to the database one of the challenges every developer faces is how to get the results quickly. If your queries are only returning a small number of records, eg. < 5, then you don’t really have to worry about execution time. That is unless your query is performing some complex processing, joining lots of tables, etc.

Most of the time developers are working with one or a small number of records, using a simple query. Everything runs quickly.

But what if your query is returning several tens or thousands of records. Assuming we have a simple query and no query optimization is needed, the challenge facing the developer is how can you get all of those records quickly into your environment and process them. Typically the database gets blamed for the query result set being returned slowly. But what if this wasn’t the case? In most cases developers take the default parameter settings of the functions and libraries. For database connection libraries and their functions, you can change some of the parameters and affect how your code, your query, gets executed on the Database server and can affect how quickly the data is shipped from the database to your code.

One very important parameter to consider is the query array size. This is the number of records the database will send to your code in each batch. The database will keep sending batches until you tell it to stop. It makes sense to have the size of this batch set to a small value, as most queries return one or a small number of records. But when we get onto returning a larger number of records it can affect the response time significantly.

I tested the effect of changing the size of the returning buffer/array using Golang and querying data in an Oracle Database, hosted on Oracle Cloud, and using goracle library to connect to the database.

[ I did a similar test using Python. The results can be found here. You will notices that Golang is significantly quicker than Python, as you would expect. ]

The database table being queried contains 55,000 records and I just executed a SELECT * FROM … on this table. The results shown below contain the timing the query took to process this data for different buffer/array sizes by setting the FetchRowCount value.

rows, err := db.Query(dbQuery, goracle.FetchRowCount(arraySize))

Screenshot 2019-05-22 14.52.48

As you can see, as the size of the buffer/array size increases the timing it takes to process the data drops. This is because the buffer/array is returning a larger number of records, and this results in a reduced number of round trips to/from the database i.e. fewer packets of records are sent across the network.

The challenge for the developer is to work out the optimal number to set for the buffer/array size. The default for the goracle libary, using Oracle client is 256 row/records.

When that above query is run, without the FetchRowCount setting, it will use this default 256 value. When this is used we get the following timings.

Screenshot 2019-05-22 15.00.00

We can see, for the data set being used in this test case the optimal setting needs to be around 1,500.

What if we set the parameter to be very large?  That would no necessarily make it quicker. You can see from the first table the timing starts to increase for the last two settings. There is an overhead in gathering and sending the data.

Here is a subset of the Golang code I used to perform the tests.

var currentTime = time.Now()

var i int

var custId int

arrayOne := [11] int{5, 10, 30, 50, 100, 200, 500, 1000, 1500, 2000, 2500}


currentTime = time.Now()


fmt.Println("Array Size = ", arraySize, " : ", currentTime.Format("03:04:05:06 PM"))


for index, arraySize := range arrayOne {

    currentTime = time.Now()

    fmt.Println(index, " Array Size = ", arraySize, " : ", currentTime.Format("03:04:05:06 PM"))


    db, err := sql.Open("goracle", username+"/"+password+"@"+host+"/"+database)

    if err != nil {

        fmt.Println("... DB Setup Failed")

        fmt.Println(err)

        return

    }

    defer db.Close()



    if err = db.Ping(); err != nil {

        fmt.Printf("Error connecting to the database: %s\n", err)

        return

    }



    currentTime = time.Now()

    fmt.Println("...Executing Query", currentTime.Format("03:04:05:06 PM"))

    dbQuery := "select cust_id from sh.customers"

    rows, err := db.Query(dbQuery, goracle.FetchRowCount(arraySize))

    if err != nil {

        fmt.Println(".....Error processing query")

        fmt.Println(err)

        return

    }

    defer rows.Close()



    i = 0

    currentTime = time.Now()

    fmt.Println("... Parsing query results", currentTime.Format("03:04:05:06 PM"))
 
   for rows.Next() {

        rows.Scan(&custId)

        i++

        if i% 10000 == 0 {

            currentTime = time.Now()

            fmt.Println("...... ",i, " customers processed", currentTime.Format("03:04:05:06 PM"))

        }

    }


    currentTime = time.Now()

    fmt.Println(i, " customers processed", currentTime.Format("03:04:05:06 PM"))


    fmt.Println("... Closing connection")

    finishTime := time.Now()

    fmt.Println("Finished at ", finishTime.Format("03:04:05:06 PM"))


}

 

 

 

 

Auto enabling APPROX_* function in the Oracle Database

Posted on Updated on

With the releases of 12.1 and 12.2 of Oracle Database we have seen some new functions that perform approximate calculations. These include:

  • APPROX_COUNT_DISTINCT
  • APPROX_COUNT_DISTINCT_DETAIL
  • APPROX_COUNT_DISTINCT_AGG
  • APPROX_MEDIAN
  • APPROX_PERCENTILE
  • APPROX_PERCENTILE_DETAIL
  • APPROX_PERCENTILE_AGG

These functions can be used when approximate answers can be used instead of the exact answer. Yes can have many scenarios for these and particularly as we move into the big data world, the ability to process our data quickly is slightly more important and exact numbers. For example, is there really a difference between 40% of our customers being of type X versus 41%. The real answer to this is, ‘It Depends!’, but for a lot of analytical and advanced analytical methods this difference doesn’t really make a difference.

There are various reports of performance improvement of anything from 6x to 50x with the response times of the queries that are using these functions, instead of using the more traditional functions.

If you are a BI or big data analyst and you have build lots of code and queries using the more traditional functions. But what if you now want to use the newer functions. Does this mean you have go and modify all the code you have written over the years? you can imagine getting approval to do this!

The simple answer to this question is ‘No’. No you don’t have to change any code, but with some parameter changes for the DB or your session you can tell the database to automatically switch from using the traditional functions (count, etc) to the newer more optimised and significantly faster APPROX_* functions.

So how can you do this magic?

First let us see what the current settings values are:

SELECT name, value 
FROM   v$ses_optimizer_env 
WHERE  sid = sys_context('USERENV','SID') 
AND    name like '%approx%';

NewImage

Now let us run a query to test what happens using the default settings (on a table I have with 10,500 records).

set auto trace on

select count(distinct cust_id) from test_inmemory;

COUNT(DISTINCTCUST_ID)
----------------------
		  1500


Execution Plan
----------------------------------------------------------
Plan hash value: 2131129625

--------------------------------------------------------------------------------------
| Id  | Operation	     | Name	     | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |		     |	   1 |	  13 |	  70   (2)| 00:00:01 |
|   1 |  SORT AGGREGATE      |		     |	   1 |	  13 |		  |	     |
|   2 |   VIEW		     | VW_DAG_0      |	1500 | 19500 |	  70   (2)| 00:00:01 |
|   3 |    HASH GROUP BY     |		     |	1500 |	7500 |	  70   (2)| 00:00:01 |
|   4 |     TABLE ACCESS FULL| TEST_INMEMORY | 10500 | 52500 |	  69   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Let us now set the automatic usage of the APPROX_* function.

alter session set approx_for_aggregation = TRUE;

SQL> select count(distinct cust_id) from test_inmemory;

COUNT(DISTINCTCUST_ID)
----------------------
		  1495


Execution Plan
----------------------------------------------------------
Plan hash value: 1029766195

---------------------------------------------------------------------------------------
| Id  | Operation	      | Name	      | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      | 	      |     1 |     5 |    69	(0)| 00:00:01 |
|   1 |  SORT AGGREGATE APPROX| 	      |     1 |     5 | 	   |	      |
|   2 |   TABLE ACCESS FULL   | TEST_INMEMORY | 10500 | 52500 |    69	(0)| 00:00:01 |
---------------------------------------------------------------------------------------

We can see above that the APPROX_* equivalent function was used, and slightly less work. But we only used this on a very small table.

The full list of session level settings is:

alter session set approx_for_aggregation = TRUE;
alter session set approx_for_aggregation = FALSE;

alter session set approx_for_count_distinct = TRUE;
alter session set approx_for_count_distinct = FALSE;

alter session set approx_for_percentile = 'PERCENTILE_CONT DETERMINISTIC';
alter session set approx_for_percentile = PERCENTILE_DISC;
alter session set approx_for_percentile = NONE;

Or at a system wide level:

alter system set approx_for_aggregation = TRUE;
alter system set approx_for_aggregation = FALSE;

alter system set approx_for_count_distinct = TRUE;
alter system set approx_for_count_distinct = FALSE;

alter system set approx_for_percentile = 'PERCENTILE_CONT DETERMINISTIC';
alter system set approx_for_percentile = PERCENTILE_DISC;
alter system set approx_for_percentile = NONE;

And to reset back to the default settings:

alter system reset approx_for_aggregation;
alter system reset approx_for_count_distinct;
alter system reset approx_for_percentile;