Skip to content

Low GPU utilization and very long inference times #5

@LoveNordling

Description

@LoveNordling

Hi,

I am experiencing extremely low GPU utilization when running cellvit-inference on whole slide images. Even though GPU memory is fully allocated, the GPU utilization stays around ~10%, and CPU usage is also low.

As a result, inference is very slow.

Observed performance

~ 1 hour per WSI

GPU utilization: ~10%

CPU utilization: low

GPU memory: mostly allocated

This occurs consistently across multiple GPUs:

RTX 3090

NVIDIA A40

NVIDIA A100

So the issue does not appear to be GPU compute performance.

Dataset

H&E whole slide images

format: NDPI

typical file size: ~0.5 GB

Troubleshooting attempted

I tried different combinations of:

cpu_count

ray_worker

ray_remote_cpus

but none of these significantly changed GPU utilization.

Questions

Is ~1 hour per WSI expected for CellViT inference?

Could NDPI format or OpenSlide I/O be a bottleneck here?

Is there a recommended configuration for maximizing GPU utilization during WSI inference?

Are there preprocessing steps that should be performed before inference to avoid slow tile loading?

At the moment, inference on a cohort takes multiple weeks, and the low GPU utilization also causes issues on shared GPU clusters where jobs are terminated due to inefficient hardware usage.

CUDA 12.1
Pytorch 2.1.2

Config:

==========================

CellViT Inference Config

==========================

Model selection (REQUIRED)

model: "SAM"

Nuclei classification taxonomy (OPTIONAL)

If you want just nuclei segmentation without types, use "binary".

Otherwise keep default-like behavior with "pannuke".

nuclei_taxonomy: "pannuke"

==========================

Inference Settings (OPTIONAL)

==========================

inference:
gpu: 0
#enforce_amp: true
#batch_size: 48

==========================

Output Settings (REQUIRED outdir)

==========================

output_format:
outdir: "./TLS_BOMI1_wholeslides_cellvitoutput"
geojson: true
graph: false
compression: false

==========================

Processing Mode (Choose One)

==========================

process_dataset:
wsi_folder: "./BOMI1_wsi/"
wsi_extension: "ndpi"

Optional overrides (normally auto-read from slide metadata via OpenSlide):

wsi_mpp: 0.441

wsi_magnification: 20

==========================

System Settings (OPTIONAL)

==========================

system:
#cpu_count: 16
#ray_worker: 8
#ray_remote_cpus: 1

memory: 64000

debug: false

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions