Handle CUDA 13 CI compatibility in gpu-app-collection by SamMausberg · Pull Request #85 · accel-sim/gpu-app-collection

SamMausberg · 2026-03-17T17:11:21Z

No description provided.

tgrogers · 2026-03-17T17:48:44Z

-        config.MEM_CLK_FREQUENCY = deviceProp.memoryClockRate * 1e-3f;
-        config.MEM_BITWIDTH = deviceProp.memoryBusWidth;
-        config.CLK_FREQUENCY = clockRateKHz * 1e-3f;
+        if (memoryClockRateKHz > 0)


What is with all the zeros?

Just to clarify my intent: the > 0 means if CUDA returns positive, we overwrite the default. If it returns 0, it's because the query is unsupported; it leaves it alone. I made this choice because before we used deviceProp.memoryClockRate and deviceProp.memoryBusWidth, this needed to be switched to cudaDeviceGetAttribute()

I don't think its a correctness issue but yea its not the most readable I can change it if you'd like.

@JRPan - Should we just be asserting if we don't get good values here?

@SamMausberg - the issue is that some of the uBenchmarks need good values here to work properly. If we give them back something conforrect, the uBench will not test the thing we think it is supposed to test.

tgrogers · 2026-03-17T17:50:19Z


-ARCH?=sm_80 sm_90a sm_100a sm_101 sm_120 
+ifneq ($(filter 13 14 15 16,$(CUDA_VERSION_MAJOR)),)
+ARCH?=sm_80 sm_90a sm_100a sm_110 sm_120


both the if and the else set the same thing.

good point whoops 🤣

tgrogers · 2026-03-17T17:51:45Z

+ifneq ($(filter 13 14 15 16,$(CUDA_VERSION_MAJOR)),)
+ci:     rodinia_2.0-ft rodinia-3.1 GPU_Microbenchmark cutlass_mini
+else
 ci:     rodinia_2.0-ft rodinia-3.1 GPU_Microbenchmark cutlass_mini cuda_samples


Samples should build it; it is an NVIDIA collection and should work at different CUDA versions.

It fails with nvcc fatal: Unsupported gpu architecture 'compute_50' when in CUDA 13.1.1 environment.

How would you like me to handle this?

hmm ok - they probably dropped support for compute_50 in 13 - I am a little shocked they didn't update the samples to build properly in 13, unless they are just lagging on updating samples.
Are you 100% sure that the latest cuda_samples fails to build with cuda13? Maybe we are syncing to an older version of samples and we need to update it?

tgrogers · 2026-03-17T17:52:43Z

 cuda_samples:
 	mkdir -p $(BINDIR)/$(BINSUBDIR)/
-	mkdir -p ./cuda/cuda-samples/build && cd ./cuda/cuda-samples/build && cmake .. && $(MAKE)
+	if [ ${CUDA_VERSION_MAJOR} -ge 13 ]; then \


If we have the stuff in setup_environment, do we still need this?

I think that setup_environment is ok for normal Makefile builds.

cuda_samples is a separate case, I believe cause it hardcodes 'CMAKE_CUDA_ARCHITECTURES' not sure.

^ could be completely wrong about this.

ok - this is a lot of code which I think is mostly setting up enviroment stuff for samples.
My hesitation here is that the samples used to build really simply, and I am not sure if we really need all this code or the AI was just working around some problem in a weird way.

If it works, we can keep it - @JRPan - who originally integrated samples? Maybe we can check with them?

JRPan · 2026-03-17T17:48:47Z

    unsigned L2_BANKS = 0;            // L2 Cache Banks (LTCs)
 };
 inline GpuConfig config;
+


If it's 0, the uBench will be wrong. Why is it failing?

If the syntax has changed, just update it to use the new syntax. There must still be a way to query device attributes.

Wait, I don't think I changed the FBP_COUNT = 0 and L2_BANKS = 0

I think in the simulator path, they are overwritten from gpgpusim.config. In the hardware path filled with queryGrInfo().

These apps also run on real GPUs, not just in gpgpu-sim. The ubench uses device query to determine kernel parameters. So these value must be correct (matching the HW) for these kernels to perform as expected. This is not the config in the simulation.

Yes, it seems like a pre-existing issue in commit 4becbe3.

The only changes I made to stuff like this in the file were to the clock rate, memory clock rate, and memory bus width.

Oh, I think I misunderstood your point, yes, that's true. When I go over this, I can make that fix even though it was pre-existing.

So @JRPan - are you saying we don't need this function?
It is weird to do this 0 think for only these functions.

Isn't the real issue that we are returning 0s for values that should have an attribute?
Generally, I would prefer not to add these extra functions to patch a deeper issue.

Said another way, why aren't we seeing success with these values? We can just fill them in from the gpgpusim config like the other values.

But maybe I am misunderstanding this.

JRPan · 2026-03-17T17:52:21Z

+ci:     rodinia_2.0-ft rodinia-3.1 GPU_Microbenchmark cutlass_mini
+else
 ci:     rodinia_2.0-ft rodinia-3.1 GPU_Microbenchmark cutlass_mini cuda_samples
+endif


we need cuda samples. If you need another commit, checkout in the makefile, or just update the submodule.

Updating the submodule might not be backward compatible. So checking out a newer commit might be better.

And this is probably the reason device query is failing?

There are two deviceQuery binaries in the repo. One in cuda-samples and GPU_Microbenchmark.

GPU_Microbenchmark builds under CUDA 13.1.1. The CUDA-samples one is failing under CUDA 13.1.1. For the reason I replied above, because the CMake files are hardcoding compute_50.

I asked Tim, but I would also like to hear your thoughts on how you guys want me to handle this issue.

*** Never mind, I didn't notice, you answered in your original comment yes, I'll update the submodule to avoid this. Wasn't sure if you guys wanted me to touch this.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

JRPan · 2026-04-17T17:18:28Z

@SamMausberg Following up on unresolved items from the earlier review:

Not yet addressed:

gpuConfig.h:49 — FBP_COUNT = 0 and L2_BANKS = 0 defaults. These apps also run on real GPUs, and the ubench uses device query to set kernel parameters, so these values must match the HW. You mentioned fixing this even though it's pre-existing — can you include it in this PR?
gpuConfig.h:326 (@tgrogers) — "What is with all the zeros?" The if (x > 0) guard pattern was explained but not changed. Tim, are you ok with the explanation, or should it be cleaned up?
src/Makefile:540 (@tgrogers) — whether this block is still needed given setup_environment. Answered but not otherwise resolved.

Once (1) is addressed I'll re-review.

tgrogers

Thanks, Sam, for doing this.

A few minor things and some questions.

@JRPan - are the app collections regressions being triggered here?

tgrogers · 2026-04-21T12:19:34Z

    unsigned L2_BANKS = 0;            // L2 Cache Banks (LTCs)
 };
 inline GpuConfig config;
+


So @JRPan - are you saying we don't need this function?
It is weird to do this 0 think for only these functions.

Isn't the real issue that we are returning 0s for values that should have an attribute?
Generally, I would prefer not to add these extra functions to patch a deeper issue.

Said another way, why aren't we seeing success with these values? We can just fill them in from the gpgpusim config like the other values.

But maybe I am misunderstanding this.

tgrogers · 2026-04-21T12:21:41Z

-        config.MEM_CLK_FREQUENCY = deviceProp.memoryClockRate * 1e-3f;
-        config.MEM_BITWIDTH = deviceProp.memoryBusWidth;
-        config.CLK_FREQUENCY = clockRateKHz * 1e-3f;
+        if (memoryClockRateKHz > 0)


@JRPan - Should we just be asserting if we don't get good values here?

@SamMausberg - the issue is that some of the uBenchmarks need good values here to work properly. If we give them back something conforrect, the uBench will not test the thing we think it is supposed to test.

tgrogers · 2026-04-21T12:23:39Z

 EXE = mbarrier

-ARCH?=sm_80 sm_90a sm_100a sm_101 sm_120 
+ifneq ($(filter 13 14 15 16,$(CUDA_VERSION_MAJOR)),)


I don't know why all this code got added - @SamMausberg, doesn't this just do the exact same thing as the single-line baseline? A bunch of the makefiles have this - please delete unless there is a good reason to have it.

tgrogers · 2026-04-21T12:27:01Z

+ifneq ($(filter 13 14 15 16,$(CUDA_VERSION_MAJOR)),)
+ci:     rodinia_2.0-ft rodinia-3.1 GPU_Microbenchmark cutlass_mini
+else
 ci:     rodinia_2.0-ft rodinia-3.1 GPU_Microbenchmark cutlass_mini cuda_samples


hmm ok - they probably dropped support for compute_50 in 13 - I am a little shocked they didn't update the samples to build properly in 13, unless they are just lagging on updating samples.
Are you 100% sure that the latest cuda_samples fails to build with cuda13? Maybe we are syncing to an older version of samples and we need to update it?

tgrogers · 2026-04-21T12:31:42Z

 cuda_samples:
 	mkdir -p $(BINDIR)/$(BINSUBDIR)/
-	mkdir -p ./cuda/cuda-samples/build && cd ./cuda/cuda-samples/build && cmake .. && $(MAKE)
+	if [ ${CUDA_VERSION_MAJOR} -ge 13 ]; then \


ok - this is a lot of code which I think is mostly setting up enviroment stuff for samples.
My hesitation here is that the samples used to build really simply, and I am not sure if we really need all this code or the AI was just working around some problem in a weird way.

tgrogers · 2026-04-21T12:32:25Z

 cuda_samples:
 	mkdir -p $(BINDIR)/$(BINSUBDIR)/
-	mkdir -p ./cuda/cuda-samples/build && cd ./cuda/cuda-samples/build && cmake .. && $(MAKE)
+	if [ ${CUDA_VERSION_MAJOR} -ge 13 ]; then \


If it works, we can keep it - @JRPan - who originally integrated samples? Maybe we can check with them?

Handle CUDA 13 CI compatibility in gpu-app-collection

8e3da12

tgrogers reviewed Mar 17, 2026

View reviewed changes

Comment thread src/cuda/rodinia/2.0-ft/backprop/backprop_cuda.cu

tgrogers reviewed Mar 17, 2026

View reviewed changes

JRPan requested changes Mar 17, 2026

View reviewed changes

Sam added 3 commits March 18, 2026 13:35

Update CUDA 13.1.1 for samples

c94990a

fix redundant

31c7370

Redundant properly

e94df09

JRPan force-pushed the cuda13_1_ci_backcompat branch 8 times, most recently from 5a83f86 to 0606dae Compare April 14, 2026 01:40

Change CI docker image to nvidia/cuda:13.1.1-cudnn-devel-ubuntu24.04

ca5e456

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

JRPan force-pushed the cuda13_1_ci_backcompat branch from 0606dae to ca5e456 Compare April 14, 2026 01:45

JRPan requested a review from tgrogers April 17, 2026 17:18

tgrogers requested changes Apr 21, 2026

View reviewed changes

Conversation

SamMausberg commented Mar 17, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SamMausberg Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SamMausberg Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JRPan commented Apr 17, 2026

Uh oh!

tgrogers left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SamMausberg Mar 17, 2026 •

edited

Loading

SamMausberg Mar 17, 2026 •

edited

Loading