For Data Science projects the top three languages every data scientist and machine learning practitioner knows are Python, R and SQL. The ranking or order of importance of these is of some debate and the reason answer is, ‘It Depends’. But one thing is for sure no matter what your environment, SQL skills will be needed, because that’s where the data lives, in the various databases of the organization. No matter what the database is SQL is the way to access and analyze it efficiently. But for Python and R, the popularity of these languages really depends on the project team and their background. Deciding between the two can come down to flipping a coin. But every has their favorite!
A (or not so) new language for data science and machine learning is Julia. Actually it has been around for a while now, and life began on it in 2009, whereas R (and S) and Python have their beginnings back in the 1980’s and early 1990’s. Does that make them legacy programming languages? or it just took a bit of time to mature and gain popularity?
There are lots of advantages to Julia, just like there are lots of advantages with the other languages. The following diagram illustrates one of the core advantages of Julia, it isn’t an interpreted language like R and Python, which means Julia will be significantly faster, yet still allows interactive development using Notebooks, just like R and Python. Julia was designed and build for data science and machine learning, and is designed for scale which makes it a good fit for MLOps. The list of advantages and differences can go on a bit and those are not the point of this post.
The remainder of this post will step through what is needed to get Julia working with an Oracle Database, and you have setup an IDE. Check out the Julia website for excellent installation instructions and selecting an IDE. If you coming from an R and/or Python background, using Jupyter Notebooks is a good option, and as you become more experienced there are a number of more advanced IDEs available for you to use. I’m assuming you have installed Julia.
If you have done a new install of Julia, make sure to add the install directory to the search PATH.
First Download load and install Oracle Instant Client. This is needed by the Julia packages to communicate with Oracle Database. After installing make sure to setup the following in your environment (environment variables and Path)
- ORACLE_HOME : points to where you installed Oracle Instant Client
- TNS_ADMIN : points to the directory containing the wallet/tnsnames files. This will be a sub-directory in Oracle Instant Client directory, for example, it points to …/instantclient_19_8/network/admin
- PATH : include the Oracle Instant Client install directory in the PATH.
Next step is to setup the Oracle Client network files. As your DBA for the tnsnames.ora file or for the Wallet Zip file for your database. The Wallet Zip file is the most common approach. Unzip this Wallet file and copy the unzipped files to the TNS_ADMIN directory. See the second bullet point above to for this (…/instantclient_19_8/network/admin).
That’s all you need to do on the Oracle setup. I’m assuming you have a username and password for the Oracle Database you will be using.
Now we can setup Julia to use the Oracle Instant Client software. It is important you have setup those environment variables l’ve listed above.
There is an Oracle.jl package, developed by Felipe Noronha, which runs on top of Oracle Instant Client. To install this, load the Pkg package then then add the Oracle package. The following shows these commands and part of the output from the installation.
julia> using Pkg julia> Pkg.add("Oracle") Updating registry at `~/.julia/registries/General` ######################################################################## 100.0% Resolving package versions... Installed Reexport ──────────────────── v1.0.0 Installed libsodium_jll ─────────────── v1.0.18+1 Installed Compat ────────────────────── v3.25.0 Installed OrderedCollections ────────── v1.3.3 Installed WebSockets ────────────────── v1.5.9 Installed JuliaInterpreter ──────────── v0.8.8 Installed DataStructures ────────────── v0.18.9 Installed DataAPI ───────────────────── v1.5.1 Installed Requires ──────────────────── v1.1.2 Installed DataValueInterfaces ───────── v1.0.0 Installed Parsers ───────────────────── v1.0.15 Installed FlameGraphs ───────────────── v0.2.5 Installed URIs ──────────────────────── v1.2.0 Installed Colors ────────────────────── v0.12.6 Installed Oracle ────────────────────── v0.2.0 ... ... ... [7240a794] + Oracle v0.2.0 [bac558e1] ↑ OrderedCollections v1.3.2 ⇒ v1.3.3 [69de0a69] ↑ Parsers v1.0.12 ⇒ v1.0.15 [189a3867] ↑ Reexport v0.2.0 ⇒ v1.0.0 [ae029012] ↑ Requires v1.1.1 ⇒ v1.1.2 [3783bdb8] + TableTraits v1.0.0 [bd369af6] + Tables v1.3.2 [0796e94c] ↑ Tokenize v0.5.8 ⇒ v0.5.13 [5c2747f8] + URIs v1.2.0 [104b5d7c] ↑ WebSockets v1.5.2 ⇒ v1.5.9 [8f1865be] ↑ ZeroMQ_jll v4.3.2+5 ⇒ v4.3.2+6 [a9144af2] + libsodium_jll v1.0.18+1 Building Oracle → `~/.julia/packages/Oracle/CEOWz/deps/build.log` julia>
You are now ready to load this Oracle package and use it to connect to an Oracle Database. Setting up a connection is really simple and in the following example I’m connecting to an ATP Database on Oracle Free Tier. The following sets up some variables, creates a connection, prints a statement and connection information and then closes the connection.
import Oracle username="oml_user" password="xxxxxxxxxxx" dbname="yyyyyyyyyyyy" conn = Oracle.Connection(username, password, dbname) println("Connected") println(conn) Oracle.close(conn)
Job done 🙂
There is little additional connection information available. To test the connection a bit more let’s list what tables I have in my test/demo schema/user.
import Oracle username="oml_user" password="xxxxxxxxxxx" dbname="yyyyyyyyyyyy" conn = Oracle.Connection(username, password, dbname) println("Tables") println("--------------------") Oracle.query(conn, "SELECT table_name FROM user_tables") do cursor for row in cursor # row values can be accessed using column name or position println( row["TABLE_NAME"] ) # same as row end end println("") println("...the end...") Oracle.close(conn)
If you come from a Python background the syntax is familiar which makes the move other to Julia an easier task.
One other difference is, running the above code does seem to run a lot quicker in Julia. I haven’t measured it and the difference is less than a second but it is noticeable. For me, the above code generate the following output,
Tables -------------------- WINE BANK_ADDITIONAL_FULL MINING_DATA_BUILD_V ...the end...
I’ll have additional posts looking are difference aspects and commands for working with and processing data in an Oracle Database.