Skip to content

Conversation

@juiceyang
Copy link

@juiceyang juiceyang commented Jan 16, 2026

Why are the changes needed?

According to org.apache.iceberg.BaseEntriesTable#schema, in org.apache.amoro.scan.TableEntriesScan#entries, the partition field in the fileRecord includes values for all PartitionSpec columns. Therefore, when org.apache.amoro.scan.TableEntriesScan#buildDeleteFile creates a DeleteFile using that fileRecord, if the fileRecord was written with a newer PartitionSpec, then inside org.apache.iceberg.DataFiles#copyPartitionData the partition fields from the older PartitionSpec will overwrite the PartitionSpec columns. This eventually causes the partition field of the created DeleteFile to be set incorrectly, with the partition value becoming null.

Close #4044 .

Brief change log

  • To avoid this issue, we no longer use TableEntriesScan to retrieve the full list of delete files. Instead, we iterate over the manifest files in the delete manifest list to obtain all DeleteFile objects. The DeleteFile objects retrieved this way have the correct partition values.

How was this patch tested?

  • Add some test cases that check the changes thoroughly including negative and positive cases if possible

  • Add screenshots for manual tests if appropriate

  • Run test locally before making a pull request

Documentation

  • Does this pull request introduce a new feature? (yes / no)
    no
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)
    not documented

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: For Iceberg tables whose PartitionSpec has been changed, Amoro will throw an error when executing the clean-dangling-delete-files operation.

1 participant