Add s3-graphs-zip workflow for zipping CE graphs#999
Add s3-graphs-zip workflow for zipping CE graphs#999
Conversation
db/migrate/20260306120000_add_argo_workflow_fields_to_envelope_downloads.rb
Show resolved
Hide resolved
|
Go ahead and deploy it in Sandbox. Then let @JWaltuch or @mparsons-ce know to test it. Then go to production. :) |
| resources: | ||
| requests: | ||
| cpu: "1000m" | ||
| memory: "2Gi" |
There was a problem hiding this comment.
With the streaming ZIP operation, why does it need 2GB minimum memory minimum? I would think, at most the memory consumption would be the size of the ZIP metadata header. That is about 50MB for 500K files. So it should not be 2GB minimum. Reason this is concerning is because we are allocating 2GB up front, that means Kubernetes is going to assume that it needs to auto-scale more VMs to server other apps. I recommend reducing this number to 200MB or lower to see where it breaks. Then increase it by 100MB increments until it doesn't break.
Also, add a # comment here indicating the reason for the low memory allocation.
Same question for the CPU. 1000m is probably too high for this workflow.
Note that these workflows are very light weight. So we should keep to the minimum required memory allocations to reduce costs as we start to run unrelated workflows in parallel.
There was a problem hiding this comment.
This configuration doesn't mean Kubernetes allocates 2GB up front. It means the node where the pod gets deployed has 2GB reserved. Anyhow, fair point; this was brought over from the previous template. I'll check with Ariel if there are special reasons for the node configuration here and reduce it accordingly.
There was a problem hiding this comment.
Reservation == allocation. No? :) Reminds me of the Seinfeld car reservation scene.
- Add script to package CE graph JSON files into zip files in S3 - Streams zip files directly to S3 with multipart upload support - Calls preconfigured webhook for notifications - Add Dockerfile for Argo / container orchestration - Add docker-compose.yml with LocalStack for integration test setup - Add workflow template - Hook up legacy CER API to Argo workflows (WIP)
b5b5001 to
6330e6d
Compare
| memory: "256Mi" | ||
| limits: | ||
| cpu: "2000m" | ||
| memory: "4Gi" |
There was a problem hiding this comment.
Make this 2x the requested, so 512Mi. It's best to see the Job fail than take up 16x more resources than intended.
Similarly please check the impact of the CPU limit as well.
| - name: destination-bucket | ||
| - name: destination-prefix | ||
| - name: max-uncompressed-zip-size-bytes | ||
| value: "209715200" |
There was a problem hiding this comment.
We're limiting zip file size either by number of files in zip (default 25k) or by max uncompressed size (default 200MB), whichever happens first. The rationale for the numbers is simply developer convenience. They are configurable by those parameters.
|
Is the intention that if a user tries to run post multiple times with the same auth header it will still only make one download request? That seems like the behavior I'm seeing so just wanted to confirm. Either that or it keeps returning data from just the first request. I also just tested running it normally and doing the downloads. Seems to work to me. |
mparsons-ce
left a comment
There was a problem hiding this comment.
The download process works as desiged
|
@JWaltuch see the comments from @mparsons-ce and please review the PR. |
Yep, that's the intention. |
Issue 17 of the new repo