Features and Limitations

Features

Synchronizing table format metadata (TableFormatSync)

Apache XTable™ (Incubating) provides users with the ability to translate metadata from one table format to another.

Apache XTable™ (Incubating) provides two sync modes, "incremental" and "full." The incremental mode is more lightweight and has better performance, especially on large tables. If there is anything that prevents the incremental mode from working properly, the tool will fall back to the full sync mode.

This sync provides users with the following:

Syncing of data files along with their column level statistics and partition metadata
Schema updates in the source are reflected in the target table metadata
Metadata maintenance for the target table formats.
- For Hudi, unreferenced files will be marked as cleaned to control the size of the metadata table.
- For Iceberg, snapshots will be expired after a configured amount of time.
- For Delta, the transaction log will be retained for a configured amount of time.

Synchronizing table format metadata in external catalogs (CatalogSync)

In addition to synchronizing table format metadata, Apache XTable™ (Incubating) now allows users to synchronize metadata for tables across multiple external catalogs continuously and incrementally. This reduces friction by eliminating the manual step of registering tables in multiple catalogs and enhances flexibility by avoiding catalog lock-in. HMS and AWS Glue are the two catalogs supported right now, support for other catalogs (Unity, Apache Polaris, Apache Gravitino, DataHub) coming soon.

Limitations and Compatibility Notes

General

Only Copy-on-Write or Read-Optimized views of tables are currently supported. This means that only the underlying parquet files are synced but log files from Hudi and delete vectors from Delta and Iceberg are not captured by the sync.

Hudi

Hudi 0.14.0 is required when reading a Hudi target table. Users will also need to enable
- the metadata table (hoodie.metadata.enable=true) and
- hive style partitioning (hoodie.datasource.write.hive_style_partitioning=true) wherever applicable when reading the data.
Be sure to enable parquet.avro.write-old-list-structure=false for proper compatibility with lists when syncing from Hudi to Iceberg.
When using Hudi as the source for an Iceberg target, you may require field IDs set in the parquet schema. To enable that, follow the instructions here.

Delta

When using Delta as the source for an Iceberg target, you may require field IDs set in the parquet schema. To enable that, follow the instructions for enabling column mapping here.
When Delta is the source, Generated Columns are not synced to the target schema. For tables that are partitioned on Generated Columns, there is limited support. For example, we support date functions like transforming a timestamp to yyyy-MM-dd format. Please file a GitHub issue or pull-request for any cases that you think should be supported.

Features and Limitations

Features​

Synchronizing table format metadata (TableFormatSync)​

Synchronizing table format metadata in external catalogs (CatalogSync)​

Limitations and Compatibility Notes​

General​

Hudi​

Delta​