fix: use cgroup memory limits for multipart upload concurrency in con…#447
fix: use cgroup memory limits for multipart upload concurrency in con…#447ProbstenHias wants to merge 1 commit intodurch:masterfrom
Conversation
…tainers calculate_max_concurrent_chunks() uses sysinfo::System::available_memory() which reads /proc/meminfo on Linux. In containerized environments (Docker, Kubernetes), /proc/meminfo reports the host node's total memory, not the container's cgroup-enforced limit. This causes the function to massively overestimate available memory and spawn too many concurrent chunk uploads, leading to OOM kills. Example: a pod with 512 MiB memory limit running on a 64 GiB node would see ~64 GiB available, calculate 100 concurrent 8 MiB chunks (800 MiB total), and get OOMKilled. The fix adds cgroup-aware memory detection that runs before the sysinfo fallback: - Parse /proc/self/cgroup to resolve the process's cgroup path; handle both nested paths (e.g. /kubepods.slice/.../cri-containerd-xxx.scope) and root path "/" which is the common case with private cgroup namespaces (Docker default, Kubernetes default since 1.25) - cgroup v2: read memory.max and memory.current from /sys/fs/cgroup/<cgroup-path>/; treat "max" as unlimited - cgroup v1: read memory.limit_in_bytes and memory.usage_in_bytes from /sys/fs/cgroup/memory/<cgroup-path>/; treat values near u64::MAX (PAGE_COUNTER_MAX) as unlimited - Available memory = limit - current usage (saturating_sub) - Return None when not under cgroup constraints, so the caller falls back to the existing sysinfo logic unchanged - All cgroup functions gated on #[cfg(target_os = "linux")] and the async feature flags to avoid dead code warnings in sync builds or on non-Linux platforms - Add unit tests covering cgroup v2 with limit, v2 unlimited, v1 with limit, v1 unlimited, path parsing, and concurrency clamping math
📝 WalkthroughWalkthroughThis change enhances memory-based calculation for maximum concurrent upload chunks by adding Linux cgroup detection. The implementation introduces a two-tier memory detection system that first attempts to read cgroup v2 memory limits, falls back to cgroup v1 if unavailable, and ultimately uses sysinfo for fallback. The chunk calculation range is extended from 2..10 to 2..100, with platform-gated helper functions and comprehensive test coverage. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Poem
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Brownian Motion (Brass)Recommendation: Proceed Summary: PR addresses OOM kills by improving memory limit detection in containers. Highlights
Unknowns
Next actions
Reflection questions
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@s3/src/bucket.rs`:
- Around line 4344-4346: The test function
test_cgroup_available_memory_returns_none_when_no_cgroup inside the cgroup_tests
module is dead because the module is #[cfg(target_os = "linux")] while the test
body is under #[cfg(not(target_os = "linux"))]; either remove the unreachable
#[cfg(not(target_os = "linux"))] block or restructure the tests so they actually
run on Linux: move the assertion into the linux-configured module and call
cgroup_available_memory() (or the real function under test) and assert the
expected Some(...) or None value, or split into two explicit tests with opposite
cfg attributes if you need OS-specific behavior; update or delete the empty test
body for test_cgroup_available_memory_returns_none_when_no_cgroup accordingly.
🧹 Nitpick comments (4)
s3/src/bucket.rs (4)
3036-3066:read_cgroup_path("v2")returnsSome("/")for root cgroup, but the corresponding test uses stricter logic.
test_read_cgroup_path_v2(line 4362) checks!path.is_empty() && path != "/", but the actual v2 branch at line 3050 only checks!path.is_empty(). While this doesn't cause a test failure (since the test input isn't"/"), the duplicated-and-divergent logic is a maintenance hazard.Consider calling the actual
read_cgroup_pathfunction from tests instead of re-implementing the parsing inline. This would catch regressions in the real code and eliminate the risk of test/implementation drift.
3008-3018: Silent failure with no observability when cgroup detection is skipped.All cgroup helper functions silently return
Noneon any failure (file not found, parse error, unlimited). Consider addinglog::debug!orlog::trace!messages when cgroup detection falls through, so operators can understand which memory source was used. This is especially valuable when diagnosing OOM issues in containers.Example logging addition
fn cgroup_available_memory() -> Option<u64> { - cgroup_v2_available_memory().or_else(cgroup_v1_available_memory) + let result = cgroup_v2_available_memory().or_else(cgroup_v1_available_memory); + if let Some(mem) = result { + log::debug!("cgroup memory detection: available={} bytes", mem); + } else { + log::debug!("cgroup memory detection: no cgroup limits found, falling back to sysinfo"); + } + result }
4349-4409: Tests re-implement parsing logic inline instead of callingread_cgroup_path.
test_read_cgroup_path_v2,test_read_cgroup_path_v1, andtest_read_cgroup_path_v2_root_returns_rootall duplicate the parsing logic fromread_cgroup_pathrather than invoking it. This means the tests don't actually validate the real implementation — a bug inread_cgroup_pathwouldn't be caught.Since
read_cgroup_pathreads/proc/self/cgroup(which may not have the expected entries in CI), consider either:
- Refactoring
read_cgroup_pathto accept the file content as a parameter (making it testable), or- At minimum, calling the actual function in the tests and asserting it doesn't panic, even if the exact result is environment-dependent.
Suggested refactor: extract parsing logic into a testable function
+/// Parse a cgroup path from the content of /proc/self/cgroup. +#[cfg(all( + target_os = "linux", + any(feature = "with-tokio", feature = "with-async-std") +))] +fn parse_cgroup_path(content: &str, controller: &str) -> Option<String> { + for line in content.lines() { + let parts: Vec<&str> = line.splitn(3, ':').collect(); + if parts.len() != 3 { + continue; + } + if controller == "v2" { + if parts[0] == "0" && parts[1].is_empty() { + let path = parts[2].trim(); + if !path.is_empty() { + return Some(path.to_string()); + } + } + } else { + let controllers = parts[1]; + if controllers.split(',').any(|c| c == controller) { + let path = parts[2].trim(); + if !path.is_empty() && path != "/" { + return Some(path.to_string()); + } + } + } + } + None +} + #[cfg(all( target_os = "linux", any(feature = "with-tokio", feature = "with-async-std") ))] fn read_cgroup_path(controller: &str) -> Option<String> { let content = std::fs::read_to_string("/proc/self/cgroup").ok()?; - for line in content.lines() { - ... - } - None + parse_cgroup_path(&content, controller) }Then tests can call
parse_cgroup_pathdirectly with known input.
4411-4435: Temp directory tests use fixed paths and may race under parallel execution.
test_cgroup_v2_available_memory_with_limitandtest_cgroup_v1_available_memory_with_limituse hardcoded temp directory names (rust_s3_cgroup_test_v2,rust_s3_cgroup_test_v1). Ifcargo testruns these in parallel (default), they could interfere with each other on repeated runs. Consider usingtempfile::tempdir()for unique, auto-cleaned directories, or adding unique suffixes.Also applies to: 4446-4463
| #[cfg(test)] | ||
| #[cfg(target_os = "linux")] | ||
| mod cgroup_tests { |
There was a problem hiding this comment.
test_cgroup_available_memory_returns_none_when_no_cgroup is dead code — the #[cfg(not(target_os = "linux"))] block can never compile.
The entire cgroup_tests module is gated with #[cfg(target_os = "linux")] (line 4345), so the #[cfg(not(target_os = "linux"))] block at line 4479 is unreachable. The test body on Linux is empty, making this a no-op test. Either remove it or restructure so it actually verifies something on Linux (e.g., call the real cgroup_available_memory() and assert it returns Some(…) or None without panicking).
Also applies to: 4473-4483
🤖 Prompt for AI Agents
In `@s3/src/bucket.rs` around lines 4344 - 4346, The test function
test_cgroup_available_memory_returns_none_when_no_cgroup inside the cgroup_tests
module is dead because the module is #[cfg(target_os = "linux")] while the test
body is under #[cfg(not(target_os = "linux"))]; either remove the unreachable
#[cfg(not(target_os = "linux"))] block or restructure the tests so they actually
run on Linux: move the assertion into the linux-configured module and call
cgroup_available_memory() (or the real function under test) and assert the
expected Some(...) or None value, or split into two explicit tests with opposite
cfg attributes if you need OS-specific behavior; update or delete the empty test
body for test_cgroup_available_memory_returns_none_when_no_cgroup accordingly.
|
What’s the status of this PR? We are having issues with uploads taking too much memory and the containers are getting OOMKilled |
…tainers
calculate_max_concurrent_chunks() uses sysinfo::System::available_memory() which reads /proc/meminfo on Linux. In containerized environments (Docker, Kubernetes), /proc/meminfo reports the host node's total memory, not the container's cgroup-enforced limit. This causes the function to massively overestimate available memory and spawn too many concurrent chunk uploads, leading to OOM kills.
Example: a pod with 512 MiB memory limit running on a 64 GiB node would see ~64 GiB available, calculate 100 concurrent 8 MiB chunks (800 MiB total), and get OOMKilled.
The fix adds cgroup-aware memory detection that runs before the sysinfo fallback:
This change is
Summary by CodeRabbit