Syncing to BigLake Metastore

This document walks through the steps to register an Apache XTable™ (Incubating) synced Iceberg table in BigLake Metastore on GCP.

Pre-requisites

Source (Hudi/Delta) table(s) already written to Google Cloud Storage. If you don't have the source table written in GCS, you can follow the steps in this tutorial to set it up.
To ensure that the BigLake API's caller (your service account used by Apache XTable™ (Incubating)) has the necessary permissions to create a BigLake table, ask your administrator to grant BigLake Admin (roles/bigquery.admin) access to the service account.
To ensure that the Storage Account API's caller (your service account used by Apache XTable™ (Incubating)) has the necessary permissions to write log/metadata files in GCS, ask your administrator to grant Storage Object User (roles/storage.objectUser) access to the service account.
If you're running Apache XTable™ (Incubating) outside GCP, you need to provide the machine access to interact with BigLake and GCS. To do so, store the permissions key for your service account in your machine using
```
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service_account_key.json
```
Clone the Apache XTable™ (Incubating) repository and create the xtable-utilities_2.12-0.2.0-SNAPSHOT-bundled.jar by following the steps on the Installation page
Download the BigLake Iceberg JAR locally. Apache XTable™ (Incubating) requires the JAR to be present in the classpath.

Steps

Important:

Currently BigLake Metastore is only accessible through Google's BigLake Rest APIs, and as such Apache XTable™ (Incubating) requires you to setup the below items prior to running sync on your source dataset.

BigLake Catalog
BigLake Database

Create BigLake Catalog

Use the Try this method on Google's REST reference docs for projects.locations.catalogs.create method to create a catalog.

In this tutorial we'll use us-west1 region.

parent

projects/<yourProjectName>/locations/us-west1/catalogs

catalogId

xtable

Create BigLake Database

Use the Try this method on Google's REST reference docs for projects.locations.catalogs.databases.create method to create a database.

parent

projects/<yourProjectName>/locations/us-west1/catalogs/xtable/databases

databaseId

xtable_synced_db

Running sync

sourceFormat: HUDI
sourceFormat: DELTA

yaml
sourceFormat: HUDI
targetFormats:
  - ICEBERG
datasets:
  -
    tableBasePath: gs://path/to/source/data
    tableName: table_name
    namespace: database_name

yaml
sourceFormat: DELTA
targetFormats:
  - ICEBERG
datasets:
  -
    tableBasePath: gs://path/to/source/data
    tableName: table_name
    namespace: xtable_synced_db

The catalog information can be specified in a yaml file and passed in with the --icebergCatalogConfig option. An example catalog.yaml file to sync with BigLake Metastore:

yaml
catalogImpl: org.apache.iceberg.gcp.biglake.BigLakeCatalog
catalogName: xtable
catalogOptions:
  gcp_project: <yourProjectName>
  gcp_location: us-west1
  warehouse: gs://path/to/warehouse

From your terminal under the cloned Apache XTable™ (Incubating) directory, run the sync process using the below command.

shell

java -cp xtable-utilities/target/xtable-utilities_2.12-0.2.0-SNAPSHOT-bundled.jar:/path/to/downloaded/biglake-catalog-iceberg1.2.0-0.1.0-with-dependencies.jar org.apache.xtable.utilities.RunSync  --datasetConfig my_config.yaml --icebergCatalogConfig catalog.yaml

Note:

At this point, if you check your bucket path, you will be able to see the metadata directory with metadata files which contains the information that helps query engines to interpret the data as an Iceberg table.

Validating the results

Once the sync succeeds, Apache XTable™ (Incubating) would have written the table directly to BigLake Metastore. We can use Try this method option on Google's REST reference docs for projects.locations.catalogs.databases.tables.get method to view the created table.

name

projects/<yourProjectName>/locations/us-west1/catalogs/xtable/databases/xtable_synced_db/tables/table_name

Conclusion

In this guide we saw how to,

sync a source table to create Iceberg metadata with Apache XTable™ (Incubating)
catalog the data as an Iceberg table in BigLake Metastore
validate the table creation using projects.locations.catalogs.databases.tables.get method

Syncing to BigLake Metastore

Pre-requisites​

Steps​

Create BigLake Catalog​

Create BigLake Database​

Running sync​

Validating the results​

Conclusion​