Skip to content

add PinotFS support for streamed untar segment download#17586

Open
mluvin-stripe wants to merge 4 commits intoapache:masterfrom
mluvin-stripe:mluvin-stream-untar-pinotfs
Open

add PinotFS support for streamed untar segment download#17586
mluvin-stripe wants to merge 4 commits intoapache:masterfrom
mluvin-stripe:mluvin-stream-untar-pinotfs

Conversation

@mluvin-stripe
Copy link

@mluvin-stripe mluvin-stripe commented Jan 27, 2026

Implements #17578 to add PinotFS filesystem support for https://docs.pinot.apache.org/operators/tutorials/performance-optimization-configurations#enabling-server-side-segment-stream-download-untar-with-rate-limiter.

Testing

Enabled pinot.server.instance.segment.stream.download.untar for offline and realtime servers, then restarted those instances. Upon restart, here's a log showing that this feature was used to download segments (code ref for the log):

INFO [SandboxMerchantPlatformCardTestingGatewayAuthAndDeclineCounts_REALTIME-RealtimeTableDataManager] [HelixTaskExecutor-message_handle_thread_22:156] Downloaded and untarred segment: SandboxMerchantPlatformCardTestingGatewayAuthAndDeclineCounts__589855__4742__20260116T2210Z from: s3://xxxxx-xxxx/SandboxMerchantPlatformCardTestingGatewayAuthAndDeclineCounts/SandboxMerchantPlatformCardTestingGatewayAuthAndDeclineCounts__589855__4742__20260116T2210Z, failed attempts: 0

@mluvin-stripe mluvin-stripe marked this pull request as ready for review January 27, 2026 22:27
@mluvin-stripe
Copy link
Author

cc @Jackie-Jiang

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Ideally we should also add a test for it

@Jackie-Jiang Jackie-Jiang requested a review from Copilot January 29, 2026 02:27
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds PinotFS filesystem support for streamed segment download and untar operations, addressing issue #17578. The enhancement enables servers to use PinotFS implementations (like S3) for the streamed download and untar feature instead of only HTTP-based downloads.

Changes:

  • Implemented fetchUntarSegmentToLocalStreamed method in PinotFSSegmentFetcher to support streamed untar using PinotFS
  • Added retry logic with exponential backoff for download operations
  • Integrated rate limiting support for bandwidth control during downloads

tries =
RetryPolicies.exponentialBackoffRetryPolicy(_retryCount, _retryWaitMs, _retryDelayScaleFactor).attempt(() -> {
try (InputStream inputStream = pinotFS.open(uri)) {
List<File> untarredFiles = TarCompressionUtils.untarWithRateLimiter(inputStream, dest, rateLimit);
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Direct access to index 0 without checking if the list is empty could cause IndexOutOfBoundsException. Add a check to verify that untarredFiles is not empty before accessing the first element.

Suggested change
List<File> untarredFiles = TarCompressionUtils.untarWithRateLimiter(inputStream, dest, rateLimit);
List<File> untarredFiles = TarCompressionUtils.untarWithRateLimiter(inputStream, dest, rateLimit);
if (untarredFiles.isEmpty()) {
_logger.warn("No files found after untarring segment from: {} to: {}", uri, dest);
return false;
}

Copilot uses AI. Check for mistakes.
throw e;
}
attempts.set(tries);
return untarredFileRef.get();
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If all retry attempts fail but no exception is thrown, untarredFileRef.get() could return null. This would cause issues for callers expecting a valid File. Consider throwing an exception if untarredFileRef is null after retries complete.

Copilot uses AI. Check for mistakes.
@codecov-commenter
Copy link

codecov-commenter commented Jan 29, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 63.24%. Comparing base (234d382) to head (5961b20).
⚠️ Report is 128 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master   #17586      +/-   ##
============================================
+ Coverage     63.21%   63.24%   +0.02%     
+ Complexity     1476     1454      -22     
============================================
  Files          3172     3179       +7     
  Lines        189806   191298    +1492     
  Branches      29046    29251     +205     
============================================
+ Hits         119987   120980     +993     
- Misses        60508    60881     +373     
- Partials       9311     9437     +126     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.21% <100.00%> (+0.06%) ⬆️
java-21 63.19% <100.00%> (+0.01%) ⬆️
temurin 63.24% <100.00%> (+0.02%) ⬆️
unittests 63.23% <100.00%> (+0.02%) ⬆️
unittests1 55.60% <100.00%> (+0.06%) ⬆️
unittests2 34.14% <0.00%> (+0.08%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@mluvin-stripe
Copy link
Author

@Jackie-Jiang ack, i'll add a test

@mluvin-stripe
Copy link
Author

@Jackie-Jiang added a tests, ready for another review now

@mluvin-stripe
Copy link
Author

@Jackie-Jiang addressed comments in 5961b20

@mluvin-stripe mluvin-stripe force-pushed the mluvin-stream-untar-pinotfs branch from 5961b20 to 301223d Compare February 27, 2026 06:12
@mluvin-stripe
Copy link
Author

@Jackie-Jiang just rebased off the latest master -- hoping that should fix these failing tests that I didn't touch https://github.com/apache/pinot/actions/runs/22410381063/job/65076626232?pr=17586

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants