Add support for requirements checks to CDI by JunAr7112 · Pull Request #1795 · NVIDIA/nvidia-container-toolkit

JunAr7112 · 2026-04-29T15:54:20Z

This PR is aimed at addressing this issue. Essentially, at present only the legacy mode will check for NVIDIA_REQUIRE_ envvars. This PR will create a common checkRequirements function (with helpers to convert to semver format and get brand reqs with NVML) for both csv mode and CDI mode to go through the NVIDIA_REQUIRE_ envvars.

This was tested by deploying a pod with an invalid envvar value and verifying that the checks would stop deployment:

// CDI-only negative test: container create should fail when NVIDIA_REQUIRE_* cannot be satisfied (e.g. cuda>=99.0 on any real host). Uses RuntimeClass nvidia (CDI / toolkit mode), not nvidia-legacy.

apiVersion: v1
kind: Pod
metadata:
name: require-cuda-fail
spec:
runtimeClassName: nvidia
restartPolicy: Never
containers:
- name: c
image: ubuntu:22.04
command: ["sleep", "3600"]
resources:
limits:
nvidia.com/gpu: "1"
env:
- name: NVIDIA_VISIBLE_DEVICES
value: "all"
- name: NVIDIA_DRIVER_CAPABILITIES
value: "all"
- name: NVIDIA_REQUIRE_CUDA
value: "cuda>=99.0"
- name: NVIDIA_REQUIRE_DRIVER
value: "driver>=9999.0.0"
..................................
Events:
Type Reason Age From Message

Normal Scheduled 12m default-scheduler Successfully assigned default/require-cuda-fail to ipp1-3167
Normal Pulled 12m kubelet Container image "ubuntu:22.04" already present on machine
Normal Created 12m kubelet Created container c
Warning Failed 12m kubelet Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: could not apply required modification to OCI specification: error modifying OCI spec: requirements not met: unsatisfied condition: driver>=9999.0.0 (driver=595.58.3)

copy-pr-bot · 2026-04-29T15:54:23Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: Arjun <agadiyar@nvidia.com>

elezar

One note here. It is not sufficient to run the check in the modifier, we would have to ensure that we generate a hook that implements the check in some form. The driver version etc. are known at the point of spec generation, and one would have to inspect the envvars in the container.

As an additional note, this would be an ideal candidate to move to a createRuntime hook.

elezar · 2026-05-08T06:17:42Z

+// checkRequirements evaluates NVIDIA_REQUIRE_* constraints using the host
+// CUDA driver API version from libcuda, the NVIDIA display driver version from
+// the driver root (libcuda / libnvidia-ml soname), the compute capability of
+// CUDA device 0, and (when requirements reference brand) the GPU product brand
+// from NVML. It is used for CSV and CDI / JIT-CDI modes.


Note that there are cases where libcuda.so is not applicable (if we're not injecting actuall GPU devices, for example).

elezar · 2026-05-08T06:19:04Z

+
+// brandTypeToRequirementString maps NVML brand enums to lowercase tokens
+// consistent with typical NVIDIA_REQUIRE_* image constraints.
+func brandTypeToRequirementString(b nvml.BrandType) (string, bool) {


Question: is this something that we already have access to in go-nvlib?

elezar · 2026-05-08T06:20:10Z

+		r.AddVersionProperty(requirements.CUDA, cudaVersion)
+	}
+
+	compteCapability, err := cuda.ComputeCapability(0)


Here we're always using the first device (which was fine for older Tegra-based systems), but this does not map to multi-device systems especially if they're heterogeneous.

JunAr7112 force-pushed the requirement_checks branch from 75213b5 to 56fd4f4 Compare April 29, 2026 15:55

JunAr7112 marked this pull request as draft April 29, 2026 15:55

JunAr7112 force-pushed the requirement_checks branch from 56fd4f4 to 405317e Compare April 30, 2026 22:52

Add support for requirements checks to CDI

68fa4bb

Signed-off-by: Arjun <agadiyar@nvidia.com>

JunAr7112 force-pushed the requirement_checks branch from 405317e to 68fa4bb Compare May 1, 2026 21:01

JunAr7112 marked this pull request as ready for review May 1, 2026 21:02

cdesiniotis marked this pull request as draft May 5, 2026 17:42

elezar requested changes May 8, 2026

View reviewed changes

elezar reviewed May 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for requirements checks to CDI#1795

Add support for requirements checks to CDI#1795
JunAr7112 wants to merge 1 commit into
NVIDIA:mainfrom
JunAr7112:requirement_checks

JunAr7112 commented Apr 29, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented Apr 29, 2026

Uh oh!

elezar left a comment

Uh oh!

elezar May 8, 2026

Uh oh!

elezar May 8, 2026

Uh oh!

elezar May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JunAr7112 commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot Bot commented Apr 29, 2026

Uh oh!

elezar left a comment

Choose a reason for hiding this comment

Uh oh!

elezar May 8, 2026

Choose a reason for hiding this comment

Uh oh!

elezar May 8, 2026

Choose a reason for hiding this comment

Uh oh!

elezar May 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JunAr7112 commented Apr 29, 2026 •

edited

Loading