add PinotFS support for streamed untar segment download#17586
add PinotFS support for streamed untar segment download#17586mluvin-stripe wants to merge 4 commits intoapache:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds PinotFS filesystem support for streamed segment download and untar operations, addressing issue #17578. The enhancement enables servers to use PinotFS implementations (like S3) for the streamed download and untar feature instead of only HTTP-based downloads.
Changes:
- Implemented
fetchUntarSegmentToLocalStreamedmethod inPinotFSSegmentFetcherto support streamed untar using PinotFS - Added retry logic with exponential backoff for download operations
- Integrated rate limiting support for bandwidth control during downloads
| tries = | ||
| RetryPolicies.exponentialBackoffRetryPolicy(_retryCount, _retryWaitMs, _retryDelayScaleFactor).attempt(() -> { | ||
| try (InputStream inputStream = pinotFS.open(uri)) { | ||
| List<File> untarredFiles = TarCompressionUtils.untarWithRateLimiter(inputStream, dest, rateLimit); |
There was a problem hiding this comment.
Direct access to index 0 without checking if the list is empty could cause IndexOutOfBoundsException. Add a check to verify that untarredFiles is not empty before accessing the first element.
| List<File> untarredFiles = TarCompressionUtils.untarWithRateLimiter(inputStream, dest, rateLimit); | |
| List<File> untarredFiles = TarCompressionUtils.untarWithRateLimiter(inputStream, dest, rateLimit); | |
| if (untarredFiles.isEmpty()) { | |
| _logger.warn("No files found after untarring segment from: {} to: {}", uri, dest); | |
| return false; | |
| } |
| throw e; | ||
| } | ||
| attempts.set(tries); | ||
| return untarredFileRef.get(); |
There was a problem hiding this comment.
If all retry attempts fail but no exception is thrown, untarredFileRef.get() could return null. This would cause issues for callers expecting a valid File. Consider throwing an exception if untarredFileRef is null after retries complete.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #17586 +/- ##
============================================
+ Coverage 63.21% 63.24% +0.02%
+ Complexity 1476 1454 -22
============================================
Files 3172 3179 +7
Lines 189806 191298 +1492
Branches 29046 29251 +205
============================================
+ Hits 119987 120980 +993
- Misses 60508 60881 +373
- Partials 9311 9437 +126
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@Jackie-Jiang ack, i'll add a test |
|
@Jackie-Jiang added a tests, ready for another review now |
pinot-common/src/main/java/org/apache/pinot/common/utils/fetcher/PinotFSSegmentFetcher.java
Outdated
Show resolved
Hide resolved
pinot-common/src/test/java/org/apache/pinot/common/utils/fetcher/PinotFSSegmentFetcherTest.java
Outdated
Show resolved
Hide resolved
|
@Jackie-Jiang addressed comments in 5961b20 |
5961b20 to
301223d
Compare
|
@Jackie-Jiang just rebased off the latest master -- hoping that should fix these failing tests that I didn't touch https://github.com/apache/pinot/actions/runs/22410381063/job/65076626232?pr=17586 |
Implements #17578 to add PinotFS filesystem support for https://docs.pinot.apache.org/operators/tutorials/performance-optimization-configurations#enabling-server-side-segment-stream-download-untar-with-rate-limiter.
Testing
Enabled
pinot.server.instance.segment.stream.download.untarfor offline and realtime servers, then restarted those instances. Upon restart, here's a log showing that this feature was used to download segments (code ref for the log):