Exploring Apache Iceberg – Part 3 – Parquet Files with Oracle Database

Posted on March 30, 2026 Updated on March 27, 2026

In this post I’ll explore how you can included data in Parquet files in an Oracle Database. This is a little sideways post from my previous posts on Apache Iceberg, as it will only look at using Parquet files, which are a core part of Iceberg tables, but is missing the meta-data layer that Iceberg tables gives.

The previous two posts on Apache Iceberg looked at using PyIceberg Python package to create and to explore the various feature of Iceberg and it’s effects on the data, files and meta-data. Part-1, Part-2.

We can include Parquet files in an Oracle Database by creating an External Table based on those files. Let’s walk through an example. The following example will for an “On-Premises” Database. I’ll have an example later in this post if you are using an Oracle Autonomous Database on Oracle Cloud.

Log into the Database as SYSTEM user, create a directory option to point to the location of the files on the operating system, and then grant privileges to the schama that needs to read that data.

			
CREATE OR REPLACE DIRECTORY parquet_dir AS '/data/log/parquet';
GRANT READ, WRITE ON DIRECTORY parquet_dir TO parquet_user;

It is assumed the directory ‘/data/log/parquet‘ exists and has some parquet files in it.

Connect the schama “parquet_user” and create an External Table that points to the parquet files in the directory

			
CREATE TABLE parquet_sales_data (
    sale_id     NUMBER,
    sale_date   DATE,
    product_id  NUMBER,
    amount      NUMBER(10,2),
    region      VARCHAR2(50)
)
ORGANIZATION EXTERNAL (
    TYPE ORACLE_BIGDATA
    DEFAULT DIRECTORY parquet_dir
    ACCESS PARAMETERS (
        com.oracle.bigdata.fileformat = PARQUET
    )
    LOCATION ('sales_*.parquet')
)
REJECT LIMIT UNLIMITED;

		

We can not query the parquey data just like any other data in a table.

			
SELECT region, product, SUM(amount) AS total_sales
FROM sales_external
GROUP BY region
ORDER BY total_sales DESC;

Care is needed to ensure column name and datatypes match between the table and the parquet file.

If our parquet files are partitioned into directories for different time periods, we can create a Partitioned External Table to handle that data, and we it we get the benefits of partition pruning, etc and better response times.

			
CREATE TABLE parquet_sales_data (
    sale_id     NUMBER,
    sale_date   DATE,
    product_id  NUMBER,
    amount      NUMBER(10,2),
    region      VARCHAR2(50)
)ORGANIZATION EXTERNAL PARALLEL 4 (
    TYPE ORACLE_BIGDATA
    DEFAULT DIRECTORY parquet_dir
    ACCESS PARAMETERS (
        com.oracle.bigdata.fileformat = PARQUET
    )
PARTITION BY LIST (region) (
    PARTITION p_emea VALUES ('EMEA') LOCATION (emea_dir:'*.parquet'),
    PARTITION p_apac VALUES ('APAC') LOCATION (apac_dir:''*.parquet'),
    PARTITION p_amer VALUES ('AMER') LOCATION (amer_dir:'*.parquet')
)
REJECT LIMIT UNLIMITED;

		

For this example, I needed to connect as SYSTEM and create the extra directories to point to the additional directories used. I also added PARALLEL 4 to the table to help speed things up a little more.

This entry was posted in Big Data, Oracle.

Ora-lytics

By Brendan Tierney

Exploring Apache Iceberg – Part 3 – Parquet Files with Oracle Database

Exploring Apache Iceberg – Part 3 – Parquet Files with Oracle Database

Share this:

Related