Features and Limitations
Features
Apache XTable™ (Incubating) provides users with the ability to translate metadata from one table format to another.
Apache XTable™ (Incubating) provides two sync modes, "incremental" and "full." The incremental mode is more lightweight and has better performance, especially on large tables. If there is anything that prevents the incremental mode from working properly, the tool will fall back to the full sync mode.
This sync provides users with the following:
- Syncing of data files along with their column level statistics and partition metadata
- Schema updates in the source are reflected in the target table metadata
- Metadata maintenance for the target table formats.
Limitations and Compatibility Notes
General
- Only Copy-on-Write or Read-Optimized views of tables are currently supported. This means that only the underlying parquet files are synced but log files from Hudi and delete vectors from Delta and Iceberg are not captured by the sync.
Hudi
- Hudi 0.14.0 is required when reading a Hudi target table. Users will also need to enable
- the metadata table (
hoodie.metadata.enable=true
) and - hive style partitioning (
hoodie.datasource.write.hive_style_partitioning=true
) wherever applicable when reading the data.
- the metadata table (
- Be sure to enable
parquet.avro.write-old-list-structure=false
for proper compatibility with lists when syncing from Hudi to Iceberg. - When using Hudi as the source for an Iceberg target, you may require field IDs set in the parquet schema. To enable that, follow the instructions here.
Delta
- When using Delta as the source for an Iceberg target, you may require field IDs set in the parquet schema. To enable that, follow the instructions for enabling column mapping here.
- When Delta is the source, Generated Columns are not synced to the target schema. For tables that are partitioned on Generated Columns, there is limited support. For example, we support date functions like transforming a timestamp to
yyyy-MM-dd
format. Please file a GitHub issue or pull-request for any cases that you think should be supported.