Skip to content

fix: use cgroup memory limits for multipart upload concurrency in con…#447

Open
ProbstenHias wants to merge 1 commit intodurch:masterfrom
ProbstenHias:fix/cgroup-memory-detection
Open

fix: use cgroup memory limits for multipart upload concurrency in con…#447
ProbstenHias wants to merge 1 commit intodurch:masterfrom
ProbstenHias:fix/cgroup-memory-detection

Conversation

@ProbstenHias
Copy link
Copy Markdown

@ProbstenHias ProbstenHias commented Feb 12, 2026

…tainers

calculate_max_concurrent_chunks() uses sysinfo::System::available_memory() which reads /proc/meminfo on Linux. In containerized environments (Docker, Kubernetes), /proc/meminfo reports the host node's total memory, not the container's cgroup-enforced limit. This causes the function to massively overestimate available memory and spawn too many concurrent chunk uploads, leading to OOM kills.

Example: a pod with 512 MiB memory limit running on a 64 GiB node would see ~64 GiB available, calculate 100 concurrent 8 MiB chunks (800 MiB total), and get OOMKilled.

The fix adds cgroup-aware memory detection that runs before the sysinfo fallback:

  • Parse /proc/self/cgroup to resolve the process's cgroup path; handle both nested paths (e.g. /kubepods.slice/.../cri-containerd-xxx.scope) and root path "/" which is the common case with private cgroup namespaces (Docker default, Kubernetes default since 1.25)
  • cgroup v2: read memory.max and memory.current from /sys/fs/cgroup//; treat "max" as unlimited
  • cgroup v1: read memory.limit_in_bytes and memory.usage_in_bytes from /sys/fs/cgroup/memory//; treat values near u64::MAX (PAGE_COUNTER_MAX) as unlimited
  • Available memory = limit - current usage (saturating_sub)
  • Return None when not under cgroup constraints, so the caller falls back to the existing sysinfo logic unchanged
  • All cgroup functions gated on #[cfg(target_os = "linux")] and the async feature flags to avoid dead code warnings in sync builds or on non-Linux platforms
  • Add unit tests covering cgroup v2 with limit, v2 unlimited, v1 with limit, v1 unlimited, path parsing, and concurrency clamping math

This change is Reviewable

Summary by CodeRabbit

  • Improvements
    • Enhanced memory detection on Linux systems to better manage upload performance when memory resources are constrained, with improved support for containerized environments.

…tainers

calculate_max_concurrent_chunks() uses sysinfo::System::available_memory()
which reads /proc/meminfo on Linux. In containerized environments (Docker,
Kubernetes), /proc/meminfo reports the host node's total memory, not the
container's cgroup-enforced limit. This causes the function to massively
overestimate available memory and spawn too many concurrent chunk uploads,
leading to OOM kills.

Example: a pod with 512 MiB memory limit running on a 64 GiB node would
see ~64 GiB available, calculate 100 concurrent 8 MiB chunks (800 MiB
total), and get OOMKilled.

The fix adds cgroup-aware memory detection that runs before the sysinfo
fallback:

- Parse /proc/self/cgroup to resolve the process's cgroup path; handle
  both nested paths (e.g. /kubepods.slice/.../cri-containerd-xxx.scope)
  and root path "/" which is the common case with private cgroup
  namespaces (Docker default, Kubernetes default since 1.25)
- cgroup v2: read memory.max and memory.current from
  /sys/fs/cgroup/<cgroup-path>/; treat "max" as unlimited
- cgroup v1: read memory.limit_in_bytes and memory.usage_in_bytes from
  /sys/fs/cgroup/memory/<cgroup-path>/; treat values near u64::MAX
  (PAGE_COUNTER_MAX) as unlimited
- Available memory = limit - current usage (saturating_sub)
- Return None when not under cgroup constraints, so the caller falls
  back to the existing sysinfo logic unchanged
- All cgroup functions gated on #[cfg(target_os = "linux")] and the
  async feature flags to avoid dead code warnings in sync builds or
  on non-Linux platforms
- Add unit tests covering cgroup v2 with limit, v2 unlimited, v1 with
  limit, v1 unlimited, path parsing, and concurrency clamping math
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 12, 2026

📝 Walkthrough

Walkthrough

This change enhances memory-based calculation for maximum concurrent upload chunks by adding Linux cgroup detection. The implementation introduces a two-tier memory detection system that first attempts to read cgroup v2 memory limits, falls back to cgroup v1 if unavailable, and ultimately uses sysinfo for fallback. The chunk calculation range is extended from 2..10 to 2..100, with platform-gated helper functions and comprehensive test coverage.

Changes

Cohort / File(s) Summary
Cgroup Memory Detection System
s3/src/bucket.rs
Added four internal helper functions (cgroup_available_memory, read_cgroup_path, cgroup_v2_available_memory, cgroup_v1_available_memory) to read and parse Linux cgroup memory limits from /proc/self/cgroup, memory.max, memory.current, memory.limit_in_bytes, and memory.usage_in_bytes. Modified calculate_max_concurrent_chunks() to use tiered memory detection (cgroup v2 → cgroup v1 → sysinfo fallback), updated clamp range to 2..100, and added default value of 3 when memory is unavailable. Included extensive Linux-only tests for cgroup path parsing, v2/v1 memory calculations, and edge cases. Updated documentation to reflect new range and platform-specific behavior.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐰 A cgroup memory tale I weave,
With v2, v1, and sysinfo's reprieve,
Chunks now soar from ten to one hundred bright,
Linux containers handled just right,
Upload speeds dance in the morning light! 🚀

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: implementing cgroup memory limit detection for multipart upload concurrency in containers, which is the core objective of the PR.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into master

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@brownian-motion-v0
Copy link
Copy Markdown

Brownian Motion (Brass)

Recommendation: Proceed

Summary: PR addresses OOM kills by improving memory limit detection in containers.
Risk: Medium · Confidence: 80%

Highlights

  • Good test coverage
  • Clear commit history

Unknowns

  • No linked issue
  • Missing performance metrics

Next actions

  • Keep: cgroup memory limit detection logic
  • Drop: unnecessary comments in the code
  • Add: documentation for new functions

Reflection questions

  • What core assumption underpins this PR's approach?
  • How does this change align with the project's longer-term goals?
  • Could there be a simpler way to achieve the primary objective here?

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@s3/src/bucket.rs`:
- Around line 4344-4346: The test function
test_cgroup_available_memory_returns_none_when_no_cgroup inside the cgroup_tests
module is dead because the module is #[cfg(target_os = "linux")] while the test
body is under #[cfg(not(target_os = "linux"))]; either remove the unreachable
#[cfg(not(target_os = "linux"))] block or restructure the tests so they actually
run on Linux: move the assertion into the linux-configured module and call
cgroup_available_memory() (or the real function under test) and assert the
expected Some(...) or None value, or split into two explicit tests with opposite
cfg attributes if you need OS-specific behavior; update or delete the empty test
body for test_cgroup_available_memory_returns_none_when_no_cgroup accordingly.
🧹 Nitpick comments (4)
s3/src/bucket.rs (4)

3036-3066: read_cgroup_path("v2") returns Some("/") for root cgroup, but the corresponding test uses stricter logic.

test_read_cgroup_path_v2 (line 4362) checks !path.is_empty() && path != "/", but the actual v2 branch at line 3050 only checks !path.is_empty(). While this doesn't cause a test failure (since the test input isn't "/"), the duplicated-and-divergent logic is a maintenance hazard.

Consider calling the actual read_cgroup_path function from tests instead of re-implementing the parsing inline. This would catch regressions in the real code and eliminate the risk of test/implementation drift.


3008-3018: Silent failure with no observability when cgroup detection is skipped.

All cgroup helper functions silently return None on any failure (file not found, parse error, unlimited). Consider adding log::debug! or log::trace! messages when cgroup detection falls through, so operators can understand which memory source was used. This is especially valuable when diagnosing OOM issues in containers.

Example logging addition
 fn cgroup_available_memory() -> Option<u64> {
-    cgroup_v2_available_memory().or_else(cgroup_v1_available_memory)
+    let result = cgroup_v2_available_memory().or_else(cgroup_v1_available_memory);
+    if let Some(mem) = result {
+        log::debug!("cgroup memory detection: available={} bytes", mem);
+    } else {
+        log::debug!("cgroup memory detection: no cgroup limits found, falling back to sysinfo");
+    }
+    result
 }

4349-4409: Tests re-implement parsing logic inline instead of calling read_cgroup_path.

test_read_cgroup_path_v2, test_read_cgroup_path_v1, and test_read_cgroup_path_v2_root_returns_root all duplicate the parsing logic from read_cgroup_path rather than invoking it. This means the tests don't actually validate the real implementation — a bug in read_cgroup_path wouldn't be caught.

Since read_cgroup_path reads /proc/self/cgroup (which may not have the expected entries in CI), consider either:

  1. Refactoring read_cgroup_path to accept the file content as a parameter (making it testable), or
  2. At minimum, calling the actual function in the tests and asserting it doesn't panic, even if the exact result is environment-dependent.
Suggested refactor: extract parsing logic into a testable function
+/// Parse a cgroup path from the content of /proc/self/cgroup.
+#[cfg(all(
+    target_os = "linux",
+    any(feature = "with-tokio", feature = "with-async-std")
+))]
+fn parse_cgroup_path(content: &str, controller: &str) -> Option<String> {
+    for line in content.lines() {
+        let parts: Vec<&str> = line.splitn(3, ':').collect();
+        if parts.len() != 3 {
+            continue;
+        }
+        if controller == "v2" {
+            if parts[0] == "0" && parts[1].is_empty() {
+                let path = parts[2].trim();
+                if !path.is_empty() {
+                    return Some(path.to_string());
+                }
+            }
+        } else {
+            let controllers = parts[1];
+            if controllers.split(',').any(|c| c == controller) {
+                let path = parts[2].trim();
+                if !path.is_empty() && path != "/" {
+                    return Some(path.to_string());
+                }
+            }
+        }
+    }
+    None
+}
+
 #[cfg(all(
     target_os = "linux",
     any(feature = "with-tokio", feature = "with-async-std")
 ))]
 fn read_cgroup_path(controller: &str) -> Option<String> {
     let content = std::fs::read_to_string("/proc/self/cgroup").ok()?;
-    for line in content.lines() {
-        ...
-    }
-    None
+    parse_cgroup_path(&content, controller)
 }

Then tests can call parse_cgroup_path directly with known input.


4411-4435: Temp directory tests use fixed paths and may race under parallel execution.

test_cgroup_v2_available_memory_with_limit and test_cgroup_v1_available_memory_with_limit use hardcoded temp directory names (rust_s3_cgroup_test_v2, rust_s3_cgroup_test_v1). If cargo test runs these in parallel (default), they could interfere with each other on repeated runs. Consider using tempfile::tempdir() for unique, auto-cleaned directories, or adding unique suffixes.

Also applies to: 4446-4463

Comment on lines +4344 to +4346
#[cfg(test)]
#[cfg(target_os = "linux")]
mod cgroup_tests {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

test_cgroup_available_memory_returns_none_when_no_cgroup is dead code — the #[cfg(not(target_os = "linux"))] block can never compile.

The entire cgroup_tests module is gated with #[cfg(target_os = "linux")] (line 4345), so the #[cfg(not(target_os = "linux"))] block at line 4479 is unreachable. The test body on Linux is empty, making this a no-op test. Either remove it or restructure so it actually verifies something on Linux (e.g., call the real cgroup_available_memory() and assert it returns Some(…) or None without panicking).

Also applies to: 4473-4483

🤖 Prompt for AI Agents
In `@s3/src/bucket.rs` around lines 4344 - 4346, The test function
test_cgroup_available_memory_returns_none_when_no_cgroup inside the cgroup_tests
module is dead because the module is #[cfg(target_os = "linux")] while the test
body is under #[cfg(not(target_os = "linux"))]; either remove the unreachable
#[cfg(not(target_os = "linux"))] block or restructure the tests so they actually
run on Linux: move the assertion into the linux-configured module and call
cgroup_available_memory() (or the real function under test) and assert the
expected Some(...) or None value, or split into two explicit tests with opposite
cfg attributes if you need OS-specific behavior; update or delete the empty test
body for test_cgroup_available_memory_returns_none_when_no_cgroup accordingly.

@swermin
Copy link
Copy Markdown

swermin commented Mar 27, 2026

What’s the status of this PR? We are having issues with uploads taking too much memory and the containers are getting OOMKilled

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants