fix(ingestion): replace SDK integration with Lambda for S3 listing#1886
Draft
iankhou wants to merge 2 commits into
Draft
fix(ingestion): replace SDK integration with Lambda for S3 listing#1886iankhou wants to merge 2 commits into
iankhou wants to merge 2 commits into
Conversation
Replace the Step Functions CallAwsService SDK integration for ListObjectsV2 with a dedicated Lambda function that returns only the Key field for each object. This eliminates the 256KB Step Functions payload limit issue entirely — the Lambda controls what it returns, stripping per-object metadata (ETag, LastModified, Size, StorageClass) that bloats the SDK integration response. This removes the need for the MaxKeys retry chain (Try1000→Try900→...) and simplifies the state machine significantly.
Contributor
Author
|
Testing this in development to verify that the lambda works as expected. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
CallAwsServiceSDK integration forListObjectsV2with a dedicated Lambda functionKeyfield for each S3 object, eliminating the 256KB payload limit issue entirelyMaxKeysretry chain (Try1000→Try900→...) that was needed to work aroundStates.DataLimitExceededProblem
The SDK integration returns the full S3
ListObjectsV2response including per-object metadata (ETag,LastModified,Size,StorageClass) for up to 1000 objects. This response can approach or exceed the 256KB Step Functions payload limit.There's an uncatchable edge case where the response is just under 256KB (so the state reports success), but Step Functions aborts the execution at the state transition boundary when internal metadata pushes the total over the limit. This
ExecutionFailedevent is not tied to a specific state, makingaddCatch(States.DataLimitExceeded)ineffective.Solution
A Lambda function calls
ListObjectsV2and returns only what the workflow needs:Contents[].Key— the object keysNextContinuationToken— for pagination (only when present)This reduces the payload from ~200-300KB to ~50-100KB for 1000 objects, well within the limit. The Lambda controls its own output size, making the payload limit concern irrelevant.
Test plan
npx projen buildto regenerate the construct class and bundle (projen auto-discovers*.lambda.tsfiles)Related