Skip to content

[CUDA] Make cuDNN optional at runtime#29252

Draft
tianleiwu wants to merge 12 commits into
mainfrom
tlwu/optional_cudnn
Draft

[CUDA] Make cuDNN optional at runtime#29252
tianleiwu wants to merge 12 commits into
mainfrom
tlwu/optional_cudnn

Conversation

@tianleiwu

Copy link
Copy Markdown
Contributor

Summary

Make cuDNN an optional runtime dependency for the CUDA Execution Provider and CUDA Plugin EP. The build still uses cuDNN headers, but provider binaries no longer directly depend on cuDNN shared libraries; cuDNN is loaded lazily when enabled and available, while no-cuDNN runs use native CUDA paths where available and report NOT_IMPLEMENTED for kernels that still require cuDNN.

This also removes the provider-level custom cuDNN path option to avoid a native library loading footgun, and adds local/CI validation for no-cuDNN runtime environments.

Key Changes

Area Changes
CUDA EP runtime loading Added a dynamic cuDNN loader and cuDNN symbol trampolines so CUDA provider binaries can avoid a direct cuDNN dependency.
Provider options Added enable_cudnn; removed cudnn_path from CUDA EP and CUDA Plugin EP provider configuration.
CUDA Plugin EP Wired optional cuDNN behavior through plugin EP config, kernel adapters, stream handles, and plugin utilities.
Python preload behavior Updated Python CUDA preload handling so cuDNN remains an optional dependency instead of an unconditional import/runtime requirement.
Tests Added/updated provider option coverage and CUDA Plugin EP no-cuDNN mode using ORT_TEST_CUDA_PLUGIN_NO_CUDNN=1.
CI Added Linux and Windows CUDA no-cuDNN workflows that build with cuDNN headers, exclude cuDNN from the runtime path, verify no direct cuDNN dependency, and run targeted tests.
Documentation Added docs/CUDA_cuDNN_Optional_Design.md and updated CUDA Plugin EP docs for no-cuDNN behavior and validation.

Testing

Validated locally on Linux CUDA 13:

  • Rebuilt CUDA EP / CUDA Plugin EP with cuDNN headers available at build time.
  • Verified provider binaries have no direct cuDNN dependency:
    • readelf -d ... | grep NEEDED | grep -i cudnn || echo "no cudnn DT_NEEDED"
    • ldd ... | grep -i cudnn || echo "no cudnn in ldd"
  • Ran CUDA Plugin EP no-cuDNN validation:
    • bash .env/cuda_130_plugin_no_cudnn.sh --test_plugin
    • Result: Ran 87 tests, OK (skipped=17)

Additional CI coverage is included for Linux and Windows no-cuDNN CUDA validation.

Comment thread onnxruntime/core/providers/cuda/cudnn_stub.cc Fixed

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

Comment thread onnxruntime/core/providers/cuda/cudnn_stub.cc Outdated
Comment thread onnxruntime/core/providers/cuda/plugin/cuda_kernel_adapter.h Outdated
Comment thread onnxruntime/core/providers/cuda/plugin/cuda_kernel_adapter.h Outdated
Comment thread onnxruntime/core/providers/cuda/plugin/cuda_kernel_adapter.h Outdated
Comment thread onnxruntime/core/providers/cuda/plugin/cuda_stream_plugin.cc Outdated
@tianleiwu tianleiwu marked this pull request as draft June 25, 2026 01:02
@tianleiwu tianleiwu requested a review from Copilot June 25, 2026 01:02

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the CUDA Execution Provider (in-tree and plugin EP) to treat cuDNN as an optional runtime dependency by introducing a lazy loader + cuDNN symbol trampolines, and plumbing a new enable_cudnn provider option through core, plugin, Python, docs, and CI so CUDA can run in no-cuDNN environments (with NOT_IMPLEMENTED for cuDNN-only kernels).

Changes:

  • Added a cuDNN dynamic loader (CudnnLibrary) and cuDNN stub/trampoline entry points so provider binaries don’t hard-link cuDNN.
  • Introduced enable_cudnn provider option and updated stream/handle creation paths to only create cuDNN handles when enabled and available.
  • Updated Python preload behavior, tests, docs, and added Linux/Windows “no cuDNN” CI workflows.

Reviewed changes

Copilot reviewed 29 out of 29 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
onnxruntime/test/python/transformers/test_cuda_plugin_ep.py Adds no-cuDNN test mode and forces plugin sessions to pass enable_cudnn=0; updates model opset handling and skips cuDNN-only operator tests.
onnxruntime/test/python/onnxruntime_test_python.py Extends provider-option coverage to include enable_cudnn.
onnxruntime/core/providers/cuda/plugin/cuda_stream_plugin.h Threads enable_cudnn into plugin stream wrapper state.
onnxruntime/core/providers/cuda/plugin/cuda_stream_plugin.cc Conditionally initializes plugin cuDNN handles based on enable_cudnn and loader availability.
onnxruntime/core/providers/cuda/plugin/cuda_plugin_utils.h Improves cuDNN error handling for the plugin EP in no-cuDNN scenarios.
onnxruntime/core/providers/cuda/plugin/cuda_kernel_adapter.h Adds enable_cudnn to plugin runtime config and guards default cuDNN handle creation/usage.
onnxruntime/core/providers/cuda/plugin/cuda_ep.h Adds enable_cudnn to plugin EP configuration.
onnxruntime/core/providers/cuda/plugin/cuda_ep.cc Configures loader policy from plugin EP config and passes enable_cudnn through to adapters/streams.
onnxruntime/core/providers/cuda/plugin/cuda_ep_factory.cc Parses enable_cudnn from provider/session config for the plugin EP.
onnxruntime/core/providers/cuda/cudnn_stub.cc Defines cuDNN symbol trampolines that forward via the loader (removes hard DT_NEEDED on cuDNN).
onnxruntime/core/providers/cuda/cudnn_loader.h Declares the cuDNN runtime loader interface and symbol resolver.
onnxruntime/core/providers/cuda/cudnn_loader.cc Implements platform-specific dlopen/LoadLibrary and symbol resolution; wires cudnn_frontend dynamic-loading handle.
onnxruntime/core/providers/cuda/cudnn_fe_call.cc Ensures frontend error paths return NOT_IMPLEMENTED with clear messaging when cuDNN is unavailable.
onnxruntime/core/providers/cuda/cuda_stream_handle.cc Makes cuDNN handle creation/destruction conditional in stream lifecycle.
onnxruntime/core/providers/cuda/cuda_kernel.h Ensures kernels that request a cuDNN handle fail clearly when cuDNN is unavailable/disabled.
onnxruntime/core/providers/cuda/cuda_execution_provider.h Updates per-thread context constructor signature to accept the consolidated provider info (incl. enable_cudnn).
onnxruntime/core/providers/cuda/cuda_execution_provider.cc Makes per-thread cuDNN handle creation optional and configures loader based on provider option.
onnxruntime/core/providers/cuda/cuda_execution_provider_info.h Adds enable_cudnn to provider-info struct and hash.
onnxruntime/core/providers/cuda/cuda_execution_provider_info.cc Parses/emits enable_cudnn in provider options.
onnxruntime/core/providers/cuda/cuda_call.cc Improves cuDNN-call error handling to surface NOT_IMPLEMENTED when cuDNN isn’t available.
onnxruntime/init.py Makes cuDNN DLL preloading optional/best-effort (don’t treat missing cuDNN as a preload failure).
docs/cuda_plugin_ep/QUICK_START.md Documents building/testing the plugin EP in no-cuDNN runtime environments.
docs/cuda_plugin_ep/cuda_plugin_ep_design.md Documents optional cuDNN runtime dependency model and enable_cudnn semantics for the plugin EP.
docs/CUDA_cuDNN_Optional_Design.md Adds a design doc describing the phased approach and implementation details for optional cuDNN.
cmake/onnxruntime_python.cmake Relaxes Windows Python metadata generation when cuDNN DLLs aren’t present.
cmake/onnxruntime_providers_cuda.cmake Removes direct cuDNN link, enables cudnn_frontend dynamic loading, and keeps cuDNN headers available.
cmake/onnxruntime_providers_cuda_plugin.cmake Removes direct cuDNN link for the plugin EP and enables cudnn_frontend dynamic loading.
.github/workflows/windows_cuda_no_cudnn.yml Adds Windows CI to validate no direct cuDNN dependency + no-cuDNN plugin runtime tests.
.github/workflows/linux_cuda_no_cudnn.yml Adds Linux CI build/smoke validation for in-tree CUDA EP without cuDNN runtime dependency.

Comment thread onnxruntime/core/providers/cuda/cudnn_loader.cc Outdated
Comment thread onnxruntime/test/python/transformers/test_cuda_plugin_ep.py Outdated
… feedback

- Guard NOMINMAX redefinition in cudnn_loader.cc (Windows -Werror)
- Link cudnn_loader into TensorRT and NV TensorRT RTX providers to
  resolve undefined CudnnLibrary::Get() symbol
- Guard CudnnLibrary reference in cuda_kernel.h for CUDA minimal build
- CudnnLibrary::Configure() now honors enable_cudnn=0 (disable request)
- Install numpy/onnx in Linux no-cuDNN smoke test before running it
- Use latest released ai.onnx opset instead of hard-coded value in test
- Use Python 3.12 in new no-cuDNN CI workflows
- Apply clang-format/ruff formatting

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

Comment thread onnxruntime/test/python/onnxruntime_test_python.py Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 31 out of 31 changed files in this pull request and generated 4 comments.

Comment thread onnxruntime/core/providers/cuda/cudnn_loader.cc Outdated
Comment thread onnxruntime/core/providers/cuda/cuda_execution_provider.cc Outdated
Comment thread onnxruntime/core/providers/cuda/plugin/cuda_ep.cc Outdated
Comment thread onnxruntime/core/providers/cuda/plugin/cuda_ep_factory.cc
Address review feedback on the process-global CudnnLibrary singleton:

- Remove CudnnLibrary::Configure()/enabled_ so the singleton is purely a
  process-wide cuDNN library loader. Per-session enable_cudnn no longer
  mutates global state, eliminating cross-session interference and the
  one-time-init bug where disabling cuDNN in one session permanently
  prevented later sessions from loading it.
- Drop the now-redundant Configure() calls in CUDAExecutionProvider and
  the plugin CudaEp constructors. cuDNN usage is already gated per-stream
  via 'enable_cudnn && CudnnLibrary::Get().Available()', whose
  short-circuit avoids any dlopen when enable_cudnn=0.
- Default factory-level CreateSyncStreamForDeviceImpl streams to
  enable_cudnn=false so they never trigger an unexpected cuDNN load;
  EP-owned compute streams still honor the EP's enable_cudnn setting.
- Reorder imports in onnxruntime_test_python.py per ruff.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 31 out of 31 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

onnxruntime/core/providers/cuda/plugin/cuda_kernel_adapter.h:539

  • Cleanup on cuBLASLt handle creation failure unconditionally calls cudnnDestroy(it->second.cudnn). If the cuDNN handle was never created (e.g., cuDNN not present), this will still call into the cuDNN stub/loader path unnecessarily. Guard the destroy with a nullptr check (and only null out the handle when it was actually destroyed).
      cudnnDestroy(it->second.cudnn);
      it->second.cudnn = nullptr;

Comment thread onnxruntime/core/providers/cuda/plugin/cuda_kernel_adapter.h Outdated
Comment thread onnxruntime/core/providers/cuda/cudnn_loader.cc
- cuda_kernel_adapter.h: split fallback cuDNN handle creation out of
  GetDefaultCudaHandlesForDevice() into a lazy GetDefaultCudnnHandleForDevice()
  so cuBLAS-only paths (and enable_cudnn=0 sessions) never trigger a cuDNN load.
- cudnn_loader.cc: load cuDNN on Windows via LoadLibraryExA with
  LOAD_LIBRARY_SEARCH_DEFAULT_DIRS to exclude the process CWD from the DLL
  search order.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants