[CUDA] Make cuDNN optional at runtime by tianleiwu · Pull Request #29252 · microsoft/onnxruntime

tianleiwu · 2026-06-25T00:55:19Z

Summary

Make cuDNN an optional runtime dependency for the CUDA Execution Provider and CUDA Plugin EP. The build still uses cuDNN headers, but provider binaries no longer directly depend on cuDNN shared libraries; cuDNN is loaded lazily when enabled and available, while no-cuDNN runs use native CUDA paths where available and report NOT_IMPLEMENTED for kernels that still require cuDNN.

This also removes the provider-level custom cuDNN path option to avoid a native library loading footgun, and adds local/CI validation for no-cuDNN runtime environments.

Key Changes

Area	Changes
CUDA EP runtime loading	Added a dynamic cuDNN loader and cuDNN symbol trampolines so CUDA provider binaries can avoid a direct cuDNN dependency.
Provider options	Added `enable_cudnn`; removed `cudnn_path` from CUDA EP and CUDA Plugin EP provider configuration.
CUDA Plugin EP	Wired optional cuDNN behavior through plugin EP config, kernel adapters, stream handles, and plugin utilities.
Python preload behavior	Updated Python CUDA preload handling so cuDNN remains an optional dependency instead of an unconditional import/runtime requirement.
Tests	Added/updated provider option coverage and CUDA Plugin EP no-cuDNN mode using `ORT_TEST_CUDA_PLUGIN_NO_CUDNN=1`.
CI	Added Linux and Windows CUDA no-cuDNN workflows that build with cuDNN headers, exclude cuDNN from the runtime path, verify no direct cuDNN dependency, and run targeted tests.
Documentation	Added `docs/CUDA_cuDNN_Optional_Design.md` and updated CUDA Plugin EP docs for no-cuDNN behavior and validation.

Testing

Validated locally on Linux CUDA 13:

Rebuilt CUDA EP / CUDA Plugin EP with cuDNN headers available at build time.
Verified provider binaries have no direct cuDNN dependency:
- readelf -d ... | grep NEEDED | grep -i cudnn || echo "no cudnn DT_NEEDED"
- ldd ... | grep -i cudnn || echo "no cudnn in ldd"
Ran CUDA Plugin EP no-cuDNN validation:
- bash .env/cuda_130_plugin_no_cudnn.sh --test_plugin
- Result: Ran 87 tests, OK (skipped=17)

Additional CI coverage is included for Linux and Windows no-cuDNN CUDA validation.

github-actions

You can commit the suggested changes from lintrunner.

Copilot

Pull request overview

This PR refactors the CUDA Execution Provider (in-tree and plugin EP) to treat cuDNN as an optional runtime dependency by introducing a lazy loader + cuDNN symbol trampolines, and plumbing a new enable_cudnn provider option through core, plugin, Python, docs, and CI so CUDA can run in no-cuDNN environments (with NOT_IMPLEMENTED for cuDNN-only kernels).

Changes:

Added a cuDNN dynamic loader (CudnnLibrary) and cuDNN stub/trampoline entry points so provider binaries don’t hard-link cuDNN.
Introduced enable_cudnn provider option and updated stream/handle creation paths to only create cuDNN handles when enabled and available.
Updated Python preload behavior, tests, docs, and added Linux/Windows “no cuDNN” CI workflows.

Reviewed changes

Copilot reviewed 29 out of 29 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
onnxruntime/test/python/transformers/test_cuda_plugin_ep.py	Adds no-cuDNN test mode and forces plugin sessions to pass `enable_cudnn=0`; updates model opset handling and skips cuDNN-only operator tests.
onnxruntime/test/python/onnxruntime_test_python.py	Extends provider-option coverage to include `enable_cudnn`.
onnxruntime/core/providers/cuda/plugin/cuda_stream_plugin.h	Threads `enable_cudnn` into plugin stream wrapper state.
onnxruntime/core/providers/cuda/plugin/cuda_stream_plugin.cc	Conditionally initializes plugin cuDNN handles based on `enable_cudnn` and loader availability.
onnxruntime/core/providers/cuda/plugin/cuda_plugin_utils.h	Improves cuDNN error handling for the plugin EP in no-cuDNN scenarios.
onnxruntime/core/providers/cuda/plugin/cuda_kernel_adapter.h	Adds `enable_cudnn` to plugin runtime config and guards default cuDNN handle creation/usage.
onnxruntime/core/providers/cuda/plugin/cuda_ep.h	Adds `enable_cudnn` to plugin EP configuration.
onnxruntime/core/providers/cuda/plugin/cuda_ep.cc	Configures loader policy from plugin EP config and passes `enable_cudnn` through to adapters/streams.
onnxruntime/core/providers/cuda/plugin/cuda_ep_factory.cc	Parses `enable_cudnn` from provider/session config for the plugin EP.
onnxruntime/core/providers/cuda/cudnn_stub.cc	Defines cuDNN symbol trampolines that forward via the loader (removes hard DT_NEEDED on cuDNN).
onnxruntime/core/providers/cuda/cudnn_loader.h	Declares the cuDNN runtime loader interface and symbol resolver.
onnxruntime/core/providers/cuda/cudnn_loader.cc	Implements platform-specific dlopen/LoadLibrary and symbol resolution; wires cudnn_frontend dynamic-loading handle.
onnxruntime/core/providers/cuda/cudnn_fe_call.cc	Ensures frontend error paths return `NOT_IMPLEMENTED` with clear messaging when cuDNN is unavailable.
onnxruntime/core/providers/cuda/cuda_stream_handle.cc	Makes cuDNN handle creation/destruction conditional in stream lifecycle.
onnxruntime/core/providers/cuda/cuda_kernel.h	Ensures kernels that request a cuDNN handle fail clearly when cuDNN is unavailable/disabled.
onnxruntime/core/providers/cuda/cuda_execution_provider.h	Updates per-thread context constructor signature to accept the consolidated provider info (incl. `enable_cudnn`).
onnxruntime/core/providers/cuda/cuda_execution_provider.cc	Makes per-thread cuDNN handle creation optional and configures loader based on provider option.
onnxruntime/core/providers/cuda/cuda_execution_provider_info.h	Adds `enable_cudnn` to provider-info struct and hash.
onnxruntime/core/providers/cuda/cuda_execution_provider_info.cc	Parses/emits `enable_cudnn` in provider options.
onnxruntime/core/providers/cuda/cuda_call.cc	Improves cuDNN-call error handling to surface `NOT_IMPLEMENTED` when cuDNN isn’t available.
onnxruntime/init.py	Makes cuDNN DLL preloading optional/best-effort (don’t treat missing cuDNN as a preload failure).
docs/cuda_plugin_ep/QUICK_START.md	Documents building/testing the plugin EP in no-cuDNN runtime environments.
docs/cuda_plugin_ep/cuda_plugin_ep_design.md	Documents optional cuDNN runtime dependency model and `enable_cudnn` semantics for the plugin EP.
docs/CUDA_cuDNN_Optional_Design.md	Adds a design doc describing the phased approach and implementation details for optional cuDNN.
cmake/onnxruntime_python.cmake	Relaxes Windows Python metadata generation when cuDNN DLLs aren’t present.
cmake/onnxruntime_providers_cuda.cmake	Removes direct cuDNN link, enables cudnn_frontend dynamic loading, and keeps cuDNN headers available.
cmake/onnxruntime_providers_cuda_plugin.cmake	Removes direct cuDNN link for the plugin EP and enables cudnn_frontend dynamic loading.
.github/workflows/windows_cuda_no_cudnn.yml	Adds Windows CI to validate no direct cuDNN dependency + no-cuDNN plugin runtime tests.
.github/workflows/linux_cuda_no_cudnn.yml	Adds Linux CI build/smoke validation for in-tree CUDA EP without cuDNN runtime dependency.

… feedback - Guard NOMINMAX redefinition in cudnn_loader.cc (Windows -Werror) - Link cudnn_loader into TensorRT and NV TensorRT RTX providers to resolve undefined CudnnLibrary::Get() symbol - Guard CudnnLibrary reference in cuda_kernel.h for CUDA minimal build - CudnnLibrary::Configure() now honors enable_cudnn=0 (disable request) - Install numpy/onnx in Linux no-cuDNN smoke test before running it - Use latest released ai.onnx opset instead of hard-coded value in test - Use Python 3.12 in new no-cuDNN CI workflows - Apply clang-format/ruff formatting

github-actions

You can commit the suggested changes from lintrunner.

Copilot

Pull request overview

Copilot reviewed 31 out of 31 changed files in this pull request and generated 4 comments.

Address review feedback on the process-global CudnnLibrary singleton: - Remove CudnnLibrary::Configure()/enabled_ so the singleton is purely a process-wide cuDNN library loader. Per-session enable_cudnn no longer mutates global state, eliminating cross-session interference and the one-time-init bug where disabling cuDNN in one session permanently prevented later sessions from loading it. - Drop the now-redundant Configure() calls in CUDAExecutionProvider and the plugin CudaEp constructors. cuDNN usage is already gated per-stream via 'enable_cudnn && CudnnLibrary::Get().Available()', whose short-circuit avoids any dlopen when enable_cudnn=0. - Default factory-level CreateSyncStreamForDeviceImpl streams to enable_cudnn=false so they never trigger an unexpected cuDNN load; EP-owned compute streams still honor the EP's enable_cudnn setting. - Reorder imports in onnxruntime_test_python.py per ruff.

Copilot

Pull request overview

Copilot reviewed 31 out of 31 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

onnxruntime/core/providers/cuda/plugin/cuda_kernel_adapter.h:539

Cleanup on cuBLASLt handle creation failure unconditionally calls cudnnDestroy(it->second.cudnn). If the cuDNN handle was never created (e.g., cuDNN not present), this will still call into the cuDNN stub/loader path unnecessarily. Guard the destroy with a nullptr check (and only null out the handle when it was actually destroyed).

      cudnnDestroy(it->second.cudnn);
      it->second.cudnn = nullptr;

- cuda_kernel_adapter.h: split fallback cuDNN handle creation out of GetDefaultCudaHandlesForDevice() into a lazy GetDefaultCudnnHandleForDevice() so cuBLAS-only paths (and enable_cudnn=0 sessions) never trigger a cuDNN load. - cudnn_loader.cc: load cuDNN on Windows via LoadLibraryExA with LOAD_LIBRARY_SEARCH_DEFAULT_DIRS to exclude the process CWD from the DLL search order.

tianleiwu added 4 commits June 25, 2026 00:54

draft design

67b327c

draft of phase 1

41457e3

remove cudnn_path provider option

23e39da

update plugin test

6e3d243

github-advanced-security AI found potential problems Jun 25, 2026

View reviewed changes

Comment thread onnxruntime/core/providers/cuda/cudnn_stub.cc Fixed

github-actions Bot reviewed Jun 25, 2026

View reviewed changes

tianleiwu marked this pull request as draft June 25, 2026 01:02

tianleiwu requested a review from Copilot June 25, 2026 01:02

Copilot started reviewing on behalf of tianleiwu June 25, 2026 01:02 View session

Copilot AI reviewed Jun 25, 2026

View reviewed changes

Comment thread onnxruntime/core/providers/cuda/cudnn_loader.cc Outdated

Comment thread onnxruntime/test/python/transformers/test_cuda_plugin_ep.py Outdated

github-actions Bot reviewed Jun 25, 2026

View reviewed changes

Comment thread onnxruntime/test/python/onnxruntime_test_python.py Outdated

lintrunner

4494a3b

tianleiwu requested a review from Copilot June 25, 2026 05:29

Copilot started reviewing on behalf of tianleiwu June 25, 2026 05:30 View session

Copilot AI reviewed Jun 25, 2026

View reviewed changes

tianleiwu added 2 commits June 24, 2026 22:56

lintrunner

2b17352

tianleiwu requested a review from Copilot June 25, 2026 05:58

Copilot started reviewing on behalf of tianleiwu June 25, 2026 05:58 View session

Copilot AI reviewed Jun 25, 2026

View reviewed changes

Comment thread onnxruntime/core/providers/cuda/plugin/cuda_kernel_adapter.h Outdated

Comment thread onnxruntime/core/providers/cuda/cudnn_loader.cc

tianleiwu added 4 commits June 24, 2026 23:30

fix CI

02fb23d

install wheel in CI

ebf322e

ci: fix Windows no-cuDNN CUDA 13 setup

0853a05

Uh oh!

Conversation

tianleiwu commented Jun 25, 2026

Summary

Key Changes

Testing

Uh oh!

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants