Querying from Google BigQuery

Iceberg tables

To read an Apache XTable™ (Incubating) synced Iceberg table from BigQuery, you have two options:

Using Iceberg JSON metadata file to create the Iceberg BigLake tables:

Apache XTable™ (Incubating) outputs metadata files for Iceberg target format syncs which can be used by BigQuery to read the BigLake tables.

sql
CREATE EXTERNAL TABLE xtable_synced_iceberg_table
WITH CONNECTION `myproject.mylocation.myconnection`
OPTIONS (
     format = 'ICEBERG',
     uris = ["gs://mybucket/mydata/mytable/metadata/iceberg.metadata.json"]
 )

Note:

This method requires you to manually update the latest metadata when there are table updates and hence Google recommends using BigLake Metastore for creating Iceberg BigLake tables. Follow the guide on Syncing to BigLake Metastore for the steps.

Important: For Hudi source format to Iceberg target format use cases

The Hudi extensions provide the ability to add field IDs to the parquet schema when writing with Hudi. This is a requirement for some engines, like BigQuery and Snowflake, when reading an Iceberg table. If you are not planning on using Iceberg, then you do not need to add these to your Hudi writers.
To avoid inserts going through row writer, we need to disable it manually. Support for row writer will be added soon.

Steps to add additional configurations to the Hudi writers:

Add the extensions jar (xtable-hudi-extensions-0.2.0-SNAPSHOT-bundled.jar) to your class path
For example, if you're using the Hudi quick-start guide for spark you can just add --jars xtable-hudi-extensions-0.2.0-SNAPSHOT-bundled.jar to the end of the command.

Set the following configurations in your writer options:

shell
hoodie.avro.write.support.class: org.apache.xtable.hudi.extensions.HoodieAvroWriteSupportWithFieldIds
hoodie.client.init.callback.classes: org.apache.xtable.hudi.extensions.AddFieldIdsClientInitCallback
hoodie.datasource.write.row.writer.enable : false

Run your existing code that use Hudi writers

Using BigLake Metastore to create the Iceberg BigLake tables:

You can use two options to register Apache XTable™ (Incubating) synced Iceberg tables to BigLake Metastore:

To directly register the Apache XTable™ (Incubating) synced Iceberg table to BigLake Metastore, follow the Apache XTable™ guide to integrate with BigLake Metastore
Use stored procedures for Spark on BigQuery to register the table in BigLake Metastore and query the tables from BigQuery.

Hudi and Delta tables

This document explains how to query Hudi and Delta table formats through the use of manifest files.

Querying from Google BigQuery

Iceberg tables​

Using Iceberg JSON metadata file to create the Iceberg BigLake tables:​

Steps to add additional configurations to the Hudi writers:​

Using BigLake Metastore to create the Iceberg BigLake tables:​

Hudi and Delta tables​

Iceberg tables

Using Iceberg JSON metadata file to create the Iceberg BigLake tables:

Steps to add additional configurations to the Hudi writers:

Using BigLake Metastore to create the Iceberg BigLake tables:

Hudi and Delta tables