Iceberg Table

Exploring Apache Iceberg – Part 5 – Iceberg Tables and Oracle Autonomous Database

Posted on Updated on

I’ve been writing a series of posts on using Apache Iceberg tables, and this fifth post will focus on using the Iceberg Tables in the Oracle LakeHouse Databae or Oracle Autonomous Data Warehouse database. Make sure to check out the previous posts as some of the steps needed to create the Iceberg files and some initital setup in an Oracle Autonomous Database. Here’s the link to Part-4.

For the example below I’ve already pre-loaded the Iceberg Table catalog and associated set of files. For this I’ve uploaded the files into a bucket called ‘iceberg-lakehouse’ and you’ll see references to this in the examples below.

One of the first things you’ll need to to is to grant certain privileges to your schema to allow it to use the Lakehouse features, like working with Iceberg Tables, setting up the Access Control Lists if needed and to have access to the DBMS_CATALOG package.

Here is the url for the ‘iceberg-lakehouse’. I’ve removed my nampespace from the url. When you setup your own one the part with <namespace> will contain the name of for your tenency.

https://objectstorage.us-ashburn-1.oraclecloud.com/n/<namespace>/b/iceberg-lakehouse/o/

The schema I’m using the the database is called ‘brendan’. Yes I could have been more creative!

Grant the DWROLE to the schema that will contain the external table to the Iceberg Table. Do this using ADMIN. Permissions on DBMS_CLOUD is also needed.

grant DWROLE to brendan;
grant execute on DBMS_CLOUD to brendan;

While still connected to ADMIN, we need to configure an Access Control List (ACL) for the Lakehouse schema.

BEGIN
DBMS_NETWORK_ACL_ADMIN.APPEND_HOST_ACE(
host => 'objectstorage.us-ashburn-1.oraclecloud.com',
lower_port => 443,
upper_port => 443,
ace => xs$ace_type(
privilege_list => xs$name_list('http'),
principal_name => 'BRENDAN', -- the Lakehouse schema
principal_type => xs_acl.ptype_db
)
);
END;

My tenency is based in Ashburn, and that’s why you see ‘us-ashburn-1’ listed in the value for host, given in the above example. You’ll need to change that to your region.

As the ‘BRENDAN’ schema we can define Credentials to autenticate to OCI Object Storage.

BEGIN
DBMS_CLOUD.CREATE_CREDENTIAL(
credential_name => 'LAKEHOUSE_CRED',
username => '<your cloud username>',
password => '<generated auth token>'
);
END;

Now we can create an External Table to point to our Iceberg Table.

BEGIN
DBMS_CLOUD.CREATE_EXTERNAL_TABLE(
table_name => 'ICEBERG_TABLE',
credential_name => 'LAKEHOUSE_CRED',
file_uri_list => 'https://objectstorage.us-ashburn-1.oraclecloud.com/n/<namespace>/b/iceberg-lakehouse/o/',
format => '{"access_protocol": {"protocol_type": "iceberg"}}' );
END;

We can not query the Iceberg Table like a regular table.

select * from iceberg_table;

Important: When work with the scenario above, it is assumed the Iceberg Table contains only one table. Another limitation is, if the structure of the Iceberg Table changes you will need to re-create the external table. As you can imagine that is not ideal, although you can schedule that to happen as needed.

To over come those limitations and to allow for the Iceberg Catalog to contain multiple tables, and for those to be picked up automatically, we need to use the package DBMS_CATALOG. This allows use to work with multiple tables within the catalog and it will also pickup any schema changes to those Iceberg Tables. Let’s have a look at doing this.

There are two steps needed before creating the external table. Both create credentials to point to the external Iceberg REST catalog and another to point to the bucket in object storage where the data files are located.

-- create credential for the REST API link to the catalog
BEGIN
DBMS_CLOUD.CREATE_CREDENTIAL(
credential_name => 'ICEBERG_CATALOG_CRED',
username => '<username for REST API>',
password => '<password for REST API>'
);
END;

The REST API details can be found in your OCI accout under Users & Security-> Users.

The OCI Object Storage Credentials is where the data files are stored in an OCI bucket. We can now mount the Iceberg Catalog

BEGIN
DBMS_CATALOG.MOUNT_ICEBERG(
catalog_name => 'ICEBEG_CATALOG',
endpoint => '<endpoint for the catalog>',
catalog_credential => 'ICEBERG_CATALOG_CRED',
data_storage_credential => '<OCI Object Storge Credential>'
);
END;

Once mounted we can explore the tables available in the Catalog using,

select * from all_tables@ICEBERG_CATALOG;

When querying the tables in the catalog, it will resolve to the latest snapshot. [see previous posts on how the catalog and the following tabl was created]

select * from sales_db.orders@ICEBERG_CATALOG;

If we want the Iceberg table to be used as an External Table in the database we can create it using the following.

BEGIN
DBMS_CATALOG.CREATE_EXTERNAL_TABLE(
catalog_name => 'ICEBERG_CATALOG',
table_name => 'ICEBERG_ORDERS',
schema_name => 'sales_db',
table_name => 'orders'
);
END;

It is now a bit easier to include in our queries.

select * from ICEBERG_ORDERS;