[CUDA] Make cuDNN optional at runtime#29252
Draft
tianleiwu wants to merge 12 commits into
Draft
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR refactors the CUDA Execution Provider (in-tree and plugin EP) to treat cuDNN as an optional runtime dependency by introducing a lazy loader + cuDNN symbol trampolines, and plumbing a new enable_cudnn provider option through core, plugin, Python, docs, and CI so CUDA can run in no-cuDNN environments (with NOT_IMPLEMENTED for cuDNN-only kernels).
Changes:
- Added a cuDNN dynamic loader (
CudnnLibrary) and cuDNN stub/trampoline entry points so provider binaries don’t hard-link cuDNN. - Introduced
enable_cudnnprovider option and updated stream/handle creation paths to only create cuDNN handles when enabled and available. - Updated Python preload behavior, tests, docs, and added Linux/Windows “no cuDNN” CI workflows.
Reviewed changes
Copilot reviewed 29 out of 29 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| onnxruntime/test/python/transformers/test_cuda_plugin_ep.py | Adds no-cuDNN test mode and forces plugin sessions to pass enable_cudnn=0; updates model opset handling and skips cuDNN-only operator tests. |
| onnxruntime/test/python/onnxruntime_test_python.py | Extends provider-option coverage to include enable_cudnn. |
| onnxruntime/core/providers/cuda/plugin/cuda_stream_plugin.h | Threads enable_cudnn into plugin stream wrapper state. |
| onnxruntime/core/providers/cuda/plugin/cuda_stream_plugin.cc | Conditionally initializes plugin cuDNN handles based on enable_cudnn and loader availability. |
| onnxruntime/core/providers/cuda/plugin/cuda_plugin_utils.h | Improves cuDNN error handling for the plugin EP in no-cuDNN scenarios. |
| onnxruntime/core/providers/cuda/plugin/cuda_kernel_adapter.h | Adds enable_cudnn to plugin runtime config and guards default cuDNN handle creation/usage. |
| onnxruntime/core/providers/cuda/plugin/cuda_ep.h | Adds enable_cudnn to plugin EP configuration. |
| onnxruntime/core/providers/cuda/plugin/cuda_ep.cc | Configures loader policy from plugin EP config and passes enable_cudnn through to adapters/streams. |
| onnxruntime/core/providers/cuda/plugin/cuda_ep_factory.cc | Parses enable_cudnn from provider/session config for the plugin EP. |
| onnxruntime/core/providers/cuda/cudnn_stub.cc | Defines cuDNN symbol trampolines that forward via the loader (removes hard DT_NEEDED on cuDNN). |
| onnxruntime/core/providers/cuda/cudnn_loader.h | Declares the cuDNN runtime loader interface and symbol resolver. |
| onnxruntime/core/providers/cuda/cudnn_loader.cc | Implements platform-specific dlopen/LoadLibrary and symbol resolution; wires cudnn_frontend dynamic-loading handle. |
| onnxruntime/core/providers/cuda/cudnn_fe_call.cc | Ensures frontend error paths return NOT_IMPLEMENTED with clear messaging when cuDNN is unavailable. |
| onnxruntime/core/providers/cuda/cuda_stream_handle.cc | Makes cuDNN handle creation/destruction conditional in stream lifecycle. |
| onnxruntime/core/providers/cuda/cuda_kernel.h | Ensures kernels that request a cuDNN handle fail clearly when cuDNN is unavailable/disabled. |
| onnxruntime/core/providers/cuda/cuda_execution_provider.h | Updates per-thread context constructor signature to accept the consolidated provider info (incl. enable_cudnn). |
| onnxruntime/core/providers/cuda/cuda_execution_provider.cc | Makes per-thread cuDNN handle creation optional and configures loader based on provider option. |
| onnxruntime/core/providers/cuda/cuda_execution_provider_info.h | Adds enable_cudnn to provider-info struct and hash. |
| onnxruntime/core/providers/cuda/cuda_execution_provider_info.cc | Parses/emits enable_cudnn in provider options. |
| onnxruntime/core/providers/cuda/cuda_call.cc | Improves cuDNN-call error handling to surface NOT_IMPLEMENTED when cuDNN isn’t available. |
| onnxruntime/init.py | Makes cuDNN DLL preloading optional/best-effort (don’t treat missing cuDNN as a preload failure). |
| docs/cuda_plugin_ep/QUICK_START.md | Documents building/testing the plugin EP in no-cuDNN runtime environments. |
| docs/cuda_plugin_ep/cuda_plugin_ep_design.md | Documents optional cuDNN runtime dependency model and enable_cudnn semantics for the plugin EP. |
| docs/CUDA_cuDNN_Optional_Design.md | Adds a design doc describing the phased approach and implementation details for optional cuDNN. |
| cmake/onnxruntime_python.cmake | Relaxes Windows Python metadata generation when cuDNN DLLs aren’t present. |
| cmake/onnxruntime_providers_cuda.cmake | Removes direct cuDNN link, enables cudnn_frontend dynamic loading, and keeps cuDNN headers available. |
| cmake/onnxruntime_providers_cuda_plugin.cmake | Removes direct cuDNN link for the plugin EP and enables cudnn_frontend dynamic loading. |
| .github/workflows/windows_cuda_no_cudnn.yml | Adds Windows CI to validate no direct cuDNN dependency + no-cuDNN plugin runtime tests. |
| .github/workflows/linux_cuda_no_cudnn.yml | Adds Linux CI build/smoke validation for in-tree CUDA EP without cuDNN runtime dependency. |
… feedback - Guard NOMINMAX redefinition in cudnn_loader.cc (Windows -Werror) - Link cudnn_loader into TensorRT and NV TensorRT RTX providers to resolve undefined CudnnLibrary::Get() symbol - Guard CudnnLibrary reference in cuda_kernel.h for CUDA minimal build - CudnnLibrary::Configure() now honors enable_cudnn=0 (disable request) - Install numpy/onnx in Linux no-cuDNN smoke test before running it - Use latest released ai.onnx opset instead of hard-coded value in test - Use Python 3.12 in new no-cuDNN CI workflows - Apply clang-format/ruff formatting
Address review feedback on the process-global CudnnLibrary singleton: - Remove CudnnLibrary::Configure()/enabled_ so the singleton is purely a process-wide cuDNN library loader. Per-session enable_cudnn no longer mutates global state, eliminating cross-session interference and the one-time-init bug where disabling cuDNN in one session permanently prevented later sessions from loading it. - Drop the now-redundant Configure() calls in CUDAExecutionProvider and the plugin CudaEp constructors. cuDNN usage is already gated per-stream via 'enable_cudnn && CudnnLibrary::Get().Available()', whose short-circuit avoids any dlopen when enable_cudnn=0. - Default factory-level CreateSyncStreamForDeviceImpl streams to enable_cudnn=false so they never trigger an unexpected cuDNN load; EP-owned compute streams still honor the EP's enable_cudnn setting. - Reorder imports in onnxruntime_test_python.py per ruff.
Contributor
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 31 out of 31 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
onnxruntime/core/providers/cuda/plugin/cuda_kernel_adapter.h:539
- Cleanup on cuBLASLt handle creation failure unconditionally calls
cudnnDestroy(it->second.cudnn). If the cuDNN handle was never created (e.g., cuDNN not present), this will still call into the cuDNN stub/loader path unnecessarily. Guard the destroy with a nullptr check (and only null out the handle when it was actually destroyed).
cudnnDestroy(it->second.cudnn);
it->second.cudnn = nullptr;
- cuda_kernel_adapter.h: split fallback cuDNN handle creation out of GetDefaultCudaHandlesForDevice() into a lazy GetDefaultCudnnHandleForDevice() so cuBLAS-only paths (and enable_cudnn=0 sessions) never trigger a cuDNN load. - cudnn_loader.cc: load cuDNN on Windows via LoadLibraryExA with LOAD_LIBRARY_SEARCH_DEFAULT_DIRS to exclude the process CWD from the DLL search order.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Make cuDNN an optional runtime dependency for the CUDA Execution Provider and CUDA Plugin EP. The build still uses cuDNN headers, but provider binaries no longer directly depend on cuDNN shared libraries; cuDNN is loaded lazily when enabled and available, while no-cuDNN runs use native CUDA paths where available and report
NOT_IMPLEMENTEDfor kernels that still require cuDNN.This also removes the provider-level custom cuDNN path option to avoid a native library loading footgun, and adds local/CI validation for no-cuDNN runtime environments.
Key Changes
enable_cudnn; removedcudnn_pathfrom CUDA EP and CUDA Plugin EP provider configuration.ORT_TEST_CUDA_PLUGIN_NO_CUDNN=1.docs/CUDA_cuDNN_Optional_Design.mdand updated CUDA Plugin EP docs for no-cuDNN behavior and validation.Testing
Validated locally on Linux CUDA 13:
readelf -d ... | grep NEEDED | grep -i cudnn || echo "no cudnn DT_NEEDED"ldd ... | grep -i cudnn || echo "no cudnn in ldd"bash .env/cuda_130_plugin_no_cudnn.sh --test_pluginRan 87 tests,OK (skipped=17)Additional CI coverage is included for Linux and Windows no-cuDNN CUDA validation.