Skip to content

Conversation

@hevinhsu
Copy link
Contributor

@hevinhsu hevinhsu commented Jan 9, 2026

What changes were proposed in this pull request?

Add support for the Content-MD5 header to verify object integrity during object uploads.

Please describe your PR in detail:

  • Validate the Content-MD5 header using pre-commit hooks before the key is committed.
  • Add unit tests, integration tests, and end-to-end tests covering:
    • PUT object with Content-MD5
    • Multipart upload with Content-MD5
  • Add no-argument constructors to KeyDataStreamOutput to simplify unit test setup.

What is the link to the Apache JIRA?

https://issues.apache.org/jira/browse/HDDS-10633

How was this patch tested?

https://github.com/hevinhsu/ozone/actions/runs/20842334193

@ivandika3 ivandika3 added the s3 S3 Gateway label Jan 12, 2026
@ivandika3 ivandika3 self-requested a review January 12, 2026 01:42
@jojochuang
Copy link
Contributor

Note:
https://docs.aws.amazon.com/AmazonS3/latest/API/API_Object.html

The ETag may or may not be an MD5 digest of the object data.

@hevinhsu
Copy link
Contributor Author

Thanks @jojochuang for pointing this out.

I agree that the final ETag may not always be an MD5 (especially for multipart or encrypted objects). However, this patch focuses on in-transit integrity checking via the Content-MD5 header.

During the upload, S3G calculates the MD5 of the incoming data stream to verify it against the client-provided Content-MD5. This verification happens before the object is committed. The document you shared refers to the ETag's behavior after the commit, which is independent of the Content-MD5 validation logic at the ingestion stage. Therefore, the ETag's final format does not affect this check.

# Conflicts:
#	hadoop-ozone/integration-test-s3/src/test/java/org/apache/hadoop/ozone/s3/awssdk/v2/AbstractS3SDKV2Tests.java
#	hadoop-ozone/s3gateway/src/main/java/org/apache/hadoop/ozone/s3/endpoint/ObjectEndpoint.java
Copy link
Contributor

@ivandika3 ivandika3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hevinhsu for the patch and the extensive tests. Overall LGTM. Just left a few comments.

@ivandika3
Copy link
Contributor

The ETag may or may not be an MD5 digest of the object data.

Yes, ETag can technically be any hash that based on the object data so it doesn't need to be MD5. Moreover, object uploaded using multipart upload ETag is a hash of all the ETag of all the parts with the "-<num_parts>" prefix, see S3MultipartUploadCompleteRequest#multipartUploadedKeyHash.

If we decide in the future to change ETag to not be md5 hash, then we need to setup a new MD5 MessageDigest. However, since ETag is currently already using MD5, we can piggy back that.

@hevinhsu
Copy link
Contributor Author

Thanks @ivandika3 for the reviews.

I’ve fixed the merge-related issue you pointed out and addressed the nits, including renaming eTag to md5Hash.

I agree with the rename. As you pointed out, ETag is not guaranteed to be an MD5 hash (especially for multipart uploads), so using md5Hash avoids conflating ETag semantics with an MD5 digest.

Comment on lines +475 to +478
String md5Hash = keyDetails.getMetadata().get(OzoneConsts.ETAG);
if (md5Hash != null) {
responseBuilder.header(HttpHeaders.ETAG, wrapInQuotes(md5Hash));
String partsCount = extractPartsCount(md5Hash);
Copy link
Contributor

@ivandika3 ivandika3 Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This should still be ETag since we are actually getting the key ETag which may not be MD5. We should only use md5Hash if we are actually calculating MD5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

s3 S3 Gateway

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants