-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Hi,
I am experiencing extremely low GPU utilization when running cellvit-inference on whole slide images. Even though GPU memory is fully allocated, the GPU utilization stays around ~10%, and CPU usage is also low.
As a result, inference is very slow.
Observed performance
~ 1 hour per WSI
GPU utilization: ~10%
CPU utilization: low
GPU memory: mostly allocated
This occurs consistently across multiple GPUs:
RTX 3090
NVIDIA A40
NVIDIA A100
So the issue does not appear to be GPU compute performance.
Dataset
H&E whole slide images
format: NDPI
typical file size: ~0.5 GB
Troubleshooting attempted
I tried different combinations of:
cpu_count
ray_worker
ray_remote_cpus
but none of these significantly changed GPU utilization.
Questions
Is ~1 hour per WSI expected for CellViT inference?
Could NDPI format or OpenSlide I/O be a bottleneck here?
Is there a recommended configuration for maximizing GPU utilization during WSI inference?
Are there preprocessing steps that should be performed before inference to avoid slow tile loading?
At the moment, inference on a cohort takes multiple weeks, and the low GPU utilization also causes issues on shared GPU clusters where jobs are terminated due to inefficient hardware usage.
CUDA 12.1
Pytorch 2.1.2
Config:
==========================
CellViT Inference Config
==========================
Model selection (REQUIRED)
model: "SAM"
Nuclei classification taxonomy (OPTIONAL)
If you want just nuclei segmentation without types, use "binary".
Otherwise keep default-like behavior with "pannuke".
nuclei_taxonomy: "pannuke"
==========================
Inference Settings (OPTIONAL)
==========================
inference:
gpu: 0
#enforce_amp: true
#batch_size: 48
==========================
Output Settings (REQUIRED outdir)
==========================
output_format:
outdir: "./TLS_BOMI1_wholeslides_cellvitoutput"
geojson: true
graph: false
compression: false
==========================
Processing Mode (Choose One)
==========================
process_dataset:
wsi_folder: "./BOMI1_wsi/"
wsi_extension: "ndpi"
Optional overrides (normally auto-read from slide metadata via OpenSlide):
wsi_mpp: 0.441
wsi_magnification: 20
==========================
System Settings (OPTIONAL)
==========================
system:
#cpu_count: 16
#ray_worker: 8
#ray_remote_cpus: 1
memory: 64000
debug: false