Skip to content

Comments

Fixes #8697 GPU memory leak by checking both image and label tensors for CUDA device#8708

Open
benediktjohannes wants to merge 9 commits intoProject-MONAI:devfrom
benediktjohannes:patch-1
Open

Fixes #8697 GPU memory leak by checking both image and label tensors for CUDA device#8708
benediktjohannes wants to merge 9 commits intoProject-MONAI:devfrom
benediktjohannes:patch-1

Conversation

@benediktjohannes
Copy link
Contributor

Modified device detection to check BOTH image and label tensors

torch.cuda.empty_cache() now called if EITHER tensor is on GPU

Prevents GPU memory leaks in mixed device scenarios

…abel tensors for CUDA device

Modified device detection to check BOTH image and label tensors

torch.cuda.empty_cache() now called if EITHER tensor is on GPU

Prevents GPU memory leaks in mixed device scenarios

Signed-off-by: benediktjohannes <benedikt.johannes.hofer@gmail.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 17, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Per-sample data access in monai/auto3dseg/analyzer.py was refactored to extract image_tensor and label_tensor and index them per sample. CUDA detection now checks both tensors using any(...). Per-sample image/label construction uses image_tensor[i] and label_tensor (with torch.int16 cast preserved). Empty-foreground placeholders changed from torch.Tensor([0]) to MetaTensor([0.0]). Parallel updates applied to FgImageStats and LabelStats. Shape checks, statistics computation, and public APIs/signatures remain unchanged.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning Description lacks required structure from template. Missing 'Fixes #' issue reference line, 'Description' section header, and 'Types of changes' checklist. Reformat description to follow template: add 'Fixes #8697' line, wrap changes in 'Description' section, and include 'Types of changes' checklist.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed Title accurately describes the main change: fixing a GPU memory leak by improving CUDA device detection for both image and label tensors.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@monai/auto3dseg/analyzer.py`:
- Around line 471-476: The code uses undefined image_tensor and label_tensor
before they are assigned; extract them from the incoming data dict like
ImageStats and FgImageStats do (use the same keys those classes expect, e.g.,
"image" for the input image and "label" for the segmentation label), then
proceed with the existing logic (compute using_cuda, save restore_grad_state,
set_grad_enabled(False), build ndas from image_tensor and cast label_tensor to
int16 for ndas_label). Ensure the variables are set to the tensor objects (with
correct shape [B,...]) before the lines referencing using_cuda, ndas, and
ndas_label.

Signed-off-by: benediktjohannes <benedikt.johannes.hofer@gmail.com>
Copy link
Member

@ericspod ericspod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @benediktjohannes I think this is fine as it is now so we'll run tests again though we'll have to wait for us to resolve a CI issue. @mingxin-zheng was involved in this code so if he can double check for me that would be great.

@benediktjohannes
Copy link
Contributor Author

Hi @ericspod thanks for the Message! I am (to be honest) also Not quite sure about this PR so please don‘t merge it as Long as Nobody Else is sure that this is Fine. Thanks!

using_cuda = False
image_tensor = d[self.image_key]
label_tensor = d[self.label_key]
using_cuda = any(isinstance(t, torch.Tensor) and t.device.type == "cuda" for t in (image_tensor, label_tensor))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. MetaTensor could also use cuda device, so I believe we will need to check that as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the answer! 👍

@benediktjohannes
Copy link
Contributor Author

By the way: I'm not quite sure at the other edits of this PR as they were just recommended by coderabbit (and another important thing: I'm not quite sure whether I could have made a mistake when pasting these) and I'm also not quite sure in this case whether they should be merged or not, but I just relyed on the coderabbit review, so please correct me if there is a mistake and please also review this before merging, thanks!

Signed-off-by: benediktjohannes <benedikt.johannes.hofer@gmail.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@monai/auto3dseg/analyzer.py`:
- Around line 471-478: Add parameterized unit tests that cover mixed-device
scenarios for LabelStats by creating cases where image_tensor is on CPU and
label_tensor is on CUDA and vice versa; instantiate batches similar to how ndas
and ndas_label are derived (use image_tensor, label_tensor and
MetaTensor/torch.Tensor on specific torch.device settings), call the LabelStats
code paths that consume ndas/ndas_label, and assert expected statistics and that
no device-related errors occur. Ensure tests toggle CUDA availability with
torch.cuda.is_available() guards, use explicit .to(device) on tensors, and
verify behavior matches the existing CPU-only and GPU-only assertions so
mixed-device handling is validated.

Comment on lines 471 to 478
image_tensor = d[self.image_key]
label_tensor = d[self.label_key]
using_cuda = any(isinstance(t, (torch.Tensor, MetaTensor)) and t.device.type == "cuda" for t in (image_tensor, label_tensor))
restore_grad_state = torch.is_grad_enabled()
torch.set_grad_enabled(False)

ndas: list[MetaTensor] = [d[self.image_key][i] for i in range(d[self.image_key].shape[0])] # type: ignore
ndas_label: MetaTensor = d[self.label_key].astype(torch.int16) # (H,W,D)
ndas: list[MetaTensor] = [image_tensor[i] for i in range(image_tensor.shape[0])] # type: ignore
ndas_label: MetaTensor = label_tensor.astype(torch.int16) # (H,W,D)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

rg -n "LabelStats|FgImageStats|using_cuda|empty_cache|get_label_ccp" -g "*test*.py" -i

Repository: Project-MONAI/MONAI

Length of output: 1005


🏁 Script executed:

# Also check the actual code at the specified lines
head -500 monai/auto3dseg/analyzer.py | tail -50

Repository: Project-MONAI/MONAI

Length of output: 2358


🏁 Script executed:

# Find the LabelStats class and its __call__ method
rg -n "class LabelStats|def __call__" monai/auto3dseg/analyzer.py

Repository: Project-MONAI/MONAI

Length of output: 662


🏁 Script executed:

sed -n '319,360p' tests/apps/test_auto3dseg.py

Repository: Project-MONAI/MONAI

Length of output: 2553


🏁 Script executed:

sed -n '341,370p' tests/apps/test_auto3dseg.py

Repository: Project-MONAI/MONAI

Length of output: 1890


🏁 Script executed:

sed -n '420,500p' tests/apps/test_auto3dseg.py

Repository: Project-MONAI/MONAI

Length of output: 3971


🏁 Script executed:

head -50 tests/apps/test_auto3dseg.py | grep -E "device|import"

Repository: Project-MONAI/MONAI

Length of output: 606


🏁 Script executed:

grep -n "^device\s*=" tests/apps/test_auto3dseg.py

Repository: Project-MONAI/MONAI

Length of output: 80


🏁 Script executed:

# Check the beginning of the test file for device setup
head -100 tests/apps/test_auto3dseg.py

Repository: Project-MONAI/MONAI

Length of output: 3483


🏁 Script executed:

grep -n "SIM_GPU_TEST_CASES\|@parameterized\|skip_if_no_cuda" tests/apps/test_auto3dseg.py

Repository: Project-MONAI/MONAI

Length of output: 359


🏁 Script executed:

# Check if there are GPU parametrized test methods
grep -B 5 -A 10 "test_label_stats.*gpu\|test_label_stats.*cuda" tests/apps/test_auto3dseg.py

Repository: Project-MONAI/MONAI

Length of output: 45


🏁 Script executed:

sed -n '214,280p' tests/apps/test_auto3dseg.py

Repository: Project-MONAI/MONAI

Length of output: 3116


🏁 Script executed:

# Check what the GPU test actually tests
grep -A 40 "@skip_if_no_cuda" tests/apps/test_auto3dseg.py | head -60

Repository: Project-MONAI/MONAI

Length of output: 1974


Add tests for mixed-device LabelStats handling.

Tests currently cover CPU-only and GPU-only scenarios but not mixed (image on CPU + label on CUDA, or vice versa). Add parameterized test cases to verify LabelStats handles these mixed-device scenarios correctly per the coding guidelines.

🤖 Prompt for AI Agents
In `@monai/auto3dseg/analyzer.py` around lines 471 - 478, Add parameterized unit
tests that cover mixed-device scenarios for LabelStats by creating cases where
image_tensor is on CPU and label_tensor is on CUDA and vice versa; instantiate
batches similar to how ndas and ndas_label are derived (use image_tensor,
label_tensor and MetaTensor/torch.Tensor on specific torch.device settings),
call the LabelStats code paths that consume ndas/ndas_label, and assert expected
statistics and that no device-related errors occur. Ensure tests toggle CUDA
availability with torch.cuda.is_available() guards, use explicit .to(device) on
tensors, and verify behavior matches the existing CPU-only and GPU-only
assertions so mixed-device handling is validated.

@ericspod
Copy link
Member

Hi @benediktjohannes we have one isort issue to fix, I don't know why this came up. You can see what edits Coderabbit applied just through the usual "Files changed" section so there's no worries of strange changes.

@ericspod ericspod enabled auto-merge (squash) February 21, 2026 00:26
@benediktjohannes
Copy link
Contributor Author

Hi, thank you very much for your answer @ericspod! Unforetunately, I don't really now what you meant with the "isort issue to fix"? So is something broken by my change? Because @CodeRabbit AI only suggests some tests?

And I'm not quite sure about two things, I'd like to ask you, please:

What do you think of the (kind of unrelated) change

    nda_foregrounds = [nda if nda.numel() > 0 else torch.Tensor([0]) for nda in nda_foregrounds]

to
nda_foregrounds = [nda if nda.numel() > 0 else MetaTensor([0.0]) for nda in nda_foregrounds]

because I guess that this was suggested by the AI and I'm not deep in this, so please let me know, what your opinion on this is, thanks!

And one other question:

Tests seem to fail because of some code style, but unforetunately, I don't really know at which point, so could you please assist me with this check because it only tells:

black, 25.11.0 (compiled: yes)
Python (CPython) 3.9.25
would reformat /home/runner/work/MONAI/MONAI/monai/auto3dseg/analyzer.py

Oh no! 💥 💔 💥
1 file would be reformatted, 1284 files would be left unchanged.
Check failed!
Please run auto style fixes: ./runtests.sh --autofix
Error: Process completed with exit code 1.

without any information on which line and which code style thing is wrong. Thank you very much!

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 22, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check the current state of the analyzer.py file to understand the changes
cat monai/auto3dseg/analyzer.py | head -100

Length of output: 3484


🏁 Script executed:

#!/bin/bash
# Find the specific lines with the MetaTensor and torch.Tensor changes
rg -n "MetaTensor\(\[0" monai/auto3dseg/analyzer.py -A2 -B2

Length of output: 504


🏁 Script executed:

#!/bin/bash
# Run black in check mode to see what it would change
black --check --diff monai/auto3dseg/analyzer.py 2>&1 || true

Length of output: 43


🏁 Script executed:

#!/bin/bash
# Check for any isort issues
isort --check-only --diff monai/auto3dseg/analyzer.py 2>&1 || true

Length of output: 43


🏁 Script executed:

#!/bin/bash
# Look at the specific sections where changes were made (around lines 330-360)
cat monai/auto3dseg/analyzer.py | sed -n '330,370p'

Length of output: 1679


🏁 Script executed:

#!/bin/bash
# Look at the second section (around lines 470-490)
cat monai/auto3dseg/analyzer.py | sed -n '470,500p'

Length of output: 1458


🏁 Script executed:

#!/bin/bash
# Check line lengths that might be too long for black (exceeds 120 chars)
cat monai/auto3dseg/analyzer.py | awk 'length > 120 {print NR": "length" chars: "$0}'

Length of output: 149


🏁 Script executed:

#!/bin/bash
# Look at the specific lines around the using_cuda detection
rg -n "using_cuda.*any.*isinstance" monai/auto3dseg/analyzer.py -A1 -B1

Length of output: 239


🏁 Script executed:

#!/bin/bash
# Look at the ImageStats class to see if it has similar CUDA detection logic
rg -n "class ImageStats" monai/auto3dseg/analyzer.py -A50 | head -80

Length of output: 3266


🏁 Script executed:

#!/bin/bash
# Check what the original code was for FgImageStats to understand the pattern
rg -n "class FgImageStats" monai/auto3dseg/analyzer.py -A50 | head -80

Length of output: 3218


🏁 Script executed:

#!/bin/bash
# Look for the ImageStats __call__ method to see if using_cuda was added there too
cat monai/auto3dseg/analyzer.py | sed -n '220,280p'

Length of output: 2653


@benediktjohannes, here are answers to your questions:

1. Black formatting issue

Line 473 is 133 characters long, exceeding Black's 120-character limit. This line needs to be broken up:

# Current (too long):
using_cuda = any(isinstance(t, (torch.Tensor, MetaTensor)) and t.device.type == "cuda" for t in (image_tensor, label_tensor))

# Fix - break into multiple lines:
using_cuda = any(
    isinstance(t, (torch.Tensor, MetaTensor)) and t.device.type == "cuda"
    for t in (image_tensor, label_tensor)
)

2. MetaTensor([0.0]) vs torch.Tensor([0])

This change improves consistency since the code already uses MetaTensor throughout (lines 477, 479, etc.). However, since you're uncertain and this is unrelated to the memory leak fix, I'd recommend asking @ericspod or @Nic-Ma whether they prefer this change or if you should keep the original torch.Tensor([0]).

3. Fixing the issues

You can either:

  • Manually apply the line-break fix shown above for line 473
  • Run ./runtests.sh --autofix as the CI suggests (this will auto-format with black/isort)
  • Or use: black monai/auto3dseg/analyzer.py directly if you have black installed

The isort issue @ericspod mentioned may have been already addressed or is separate from the black issue.

@benediktjohannes
Copy link
Contributor Author

Thanks coderabbit!

Signed-off-by: Benedikt Johannes <benedikt.johannes.hofer@gmail.com>
auto-merge was automatically disabled February 22, 2026 14:40

Head branch was pushed to by a user without write access

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
monai/auto3dseg/analyzer.py (1)

473-475: MetaTensor is redundant in the isinstance tuple.

MetaTensor inherits from both MetaObj and torch.Tensor, so isinstance(t, torch.Tensor) already matches any MetaTensor instance. The explicit MetaTensor in the tuple is dead weight.

♻️ Proposed simplification
-            isinstance(t, (torch.Tensor, MetaTensor)) and t.device.type == "cuda"
+            isinstance(t, torch.Tensor) and t.device.type == "cuda"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@monai/auto3dseg/analyzer.py` around lines 473 - 475, The isinstance check
that builds using_cuda redundantly lists MetaTensor even though MetaTensor
already subclasses torch.Tensor; update the generator expression that sets
using_cuda (which iterates over image_tensor and label_tensor) to only check
isinstance(t, torch.Tensor) (or include MetaObj if intentionally needed) and
keep the rest of the condition (t.device.type == "cuda") unchanged so MetaTensor
instances are still detected via torch.Tensor.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@monai/auto3dseg/analyzer.py`:
- Around line 473-476: The any(...) call assigned to using_cuda has its
generator expression aligned with the assignment which breaks Black; inside the
parentheses of the any call (in the expression using_cuda = any(...)) indent the
generator expression elements (the isinstance(...) check and the for t in
(image_tensor, label_tensor) clause) one additional level (4 spaces) so they sit
inside the parentheses, preserving normal multi-line call formatting for the
any(...) expression referencing torch.Tensor, MetaTensor, image_tensor and
label_tensor.

---

Nitpick comments:
In `@monai/auto3dseg/analyzer.py`:
- Around line 473-475: The isinstance check that builds using_cuda redundantly
lists MetaTensor even though MetaTensor already subclasses torch.Tensor; update
the generator expression that sets using_cuda (which iterates over image_tensor
and label_tensor) to only check isinstance(t, torch.Tensor) (or include MetaObj
if intentionally needed) and keep the rest of the condition (t.device.type ==
"cuda") unchanged so MetaTensor instances are still detected via torch.Tensor.

@benediktjohannes
Copy link
Contributor Author

@ericspod I found out about something else which is quite interesting to me:

We do not only use this "using_cuda" here, but also in data_analyzer.py (something different to this)

        if self.device.type == "cuda":
        # release unreferenced tensors to mitigate OOM
        # limitation: https://github.com/pytorch/pytorch/issues/12873#issuecomment-482916237
        torch.cuda.empty_cache()

but it's not called "using_cuda" there, but only this question whether self.device.type == "cuda", but with similar consequences (empty_cache()). Does this have to be changed here as well? Thank you very much for your help!

Signed-off-by: Benedikt Johannes <benedikt.johannes.hofer@gmail.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
monai/auto3dseg/analyzer.py (1)

471-476: Simplify isinstance check — MetaTensor inherits from torch.Tensor.

The isinstance(t, (torch.Tensor, MetaTensor)) check is redundant since MetaTensor is a subclass of torch.Tensor. Use isinstance(t, torch.Tensor) instead.

Suggested change
         using_cuda = any(
-            isinstance(t, (torch.Tensor, MetaTensor)) and t.device.type == "cuda"
+            isinstance(t, torch.Tensor) and t.device.type == "cuda"
             for t in (image_tensor, label_tensor)
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@monai/auto3dseg/analyzer.py` around lines 471 - 476, The isinstance check
used when computing using_cuda is redundant: replace the tuple isinstance(t,
(torch.Tensor, MetaTensor)) with isinstance(t, torch.Tensor) in the
comprehension that iterates over image_tensor and label_tensor (the variable
using_cuda) so it correctly detects tensors without referencing MetaTensor
explicitly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@monai/auto3dseg/analyzer.py`:
- Around line 471-481: Add a unit test that exercises LabelStats with mixed
device tensors (image on CPU and label on CUDA, and the reverse) to prevent
regressions: create batched MetaTensor/Tensor pairs where one is moved to "cuda"
and the other remains on "cpu", call the LabelStats-related entry point used in
analyzer.py (e.g., the LabelStats class or the method that consumes image_tensor
and label_tensor) and assert it completes without error and returns the expected
statistics; include both permutations (image CPU/label CUDA and image CUDA/label
CPU) so the code paths that check device types (using image_tensor,
label_tensor, and LabelStats) are covered.

---

Nitpick comments:
In `@monai/auto3dseg/analyzer.py`:
- Around line 471-476: The isinstance check used when computing using_cuda is
redundant: replace the tuple isinstance(t, (torch.Tensor, MetaTensor)) with
isinstance(t, torch.Tensor) in the comprehension that iterates over image_tensor
and label_tensor (the variable using_cuda) so it correctly detects tensors
without referencing MetaTensor explicitly.

@benediktjohannes
Copy link
Contributor Author

Hmmm, @coderrabbit AI suggests that 471-476: Simplify isinstance check — MetaTensor inherits from torch.Tensor.

The isinstance(t, (torch.Tensor, MetaTensor)) check is redundant since MetaTensor is a subclass of torch.Tensor. Use isinstance(t, torch.Tensor) instead.

Suggested change
using_cuda = any(

  •        isinstance(t, (torch.Tensor, MetaTensor)) and t.device.type == "cuda"
    
  •        isinstance(t, torch.Tensor) and t.device.type == "cuda"
           for t in (image_tensor, label_tensor)
       )
    

we should change this because MetaTensor doesn't need to be checked here, but @mingxin-zheng suggested that we should add MetaTensor, so I'll prefer to let MetaTensor be checked here as well, just for safety. 👍

@ericspod
Copy link
Member

Hi, thank you very much for your answer @ericspod! Unforetunately, I don't really now what you meant with the "isort issue to fix"? So is something broken by my change? Because @CodeRabbit AI only suggests some tests?

I think it's actually only the code formatting complaint. You can run ./runtests.sh --autofix to fix that which will run black and update code.

And I'm not quite sure about two things, I'd like to ask you, please:

What do you think of the (kind of unrelated) change

    nda_foregrounds = [nda if nda.numel() > 0 else torch.Tensor([0]) for nda in nda_foregrounds]

to nda_foregrounds = [nda if nda.numel() > 0 else MetaTensor([0.0]) for nda in nda_foregrounds]

because I guess that this was suggested by the AI and I'm not deep in this, so please let me know, what your opinion on this is, thanks!

I saw this earlier, it will use MetaTensor instances for later operations rather than just Tensor which may be needed for some reason. It's probably safe to do this though, but I'd think you'd only want to if you're getting Tensor instances somehow when expecting MetaTensor instances.

And one other question:

Tests seem to fail because of some code style, but unforetunately, I don't really know at which point, so could you please assist me with this check because it only tells:

black, 25.11.0 (compiled: yes) Python (CPython) 3.9.25 would reformat /home/runner/work/MONAI/MONAI/monai/auto3dseg/analyzer.py

Oh no! 💥 💔 💥 1 file would be reformatted, 1284 files would be left unchanged. Check failed! Please run auto style fixes: ./runtests.sh --autofix Error: Process completed with exit code 1.

without any information on which line and which code style thing is wrong. Thank you very much!

That's related to what I mentioned above, running the autofixes should resolve it.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 22, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find and examine the data_analyzer.py file for empty_cache usage
fd -t f "data_analyzer.py" --exec cat {}

Length of output: 18627


🏁 Script executed:

#!/bin/bash
# Search for empty_cache usage in data_analyzer.py with context
rg -n "empty_cache" monai/auto3dseg/data_analyzer.py -B5 -A5

Length of output: 72


🏁 Script executed:

#!/bin/bash
# Search for device type checks in data_analyzer.py
rg -n "device.type.*cuda" monai/auto3dseg/data_analyzer.py -B3 -A3

Length of output: 72


@benediktjohannes, regarding your question about data_analyzer.py:

Yes, there's a similar but different issue there. In data_analyzer.py around line 301:

if self.device.type == "cuda":
    torch.cuda.empty_cache()

The problem: the code uses a local device variable (which can fall back to CPU) but checks self.device for the empty_cache call. If self.device was set to "cuda" but the actual device variable was changed to "cpu" due to CUDA unavailability, empty_cache() would still be called.

Fix for data_analyzer.py:

if device.type == "cuda":
    torch.cuda.empty_cache()

However, this is a different scenario than your current PR:

  • Your PR (analyzer.py): Individual tensors may be on different devices → check if any tensor is on CUDA
  • data_analyzer.py: The intended device vs. actual device may differ → check the actual device used

Recommendation: Keep your current PR focused on the analyzer.py fix. The data_analyzer.py issue could be addressed separately. Would you like me to open an issue for tracking the data_analyzer.py fix?

@ericspod
Copy link
Member

@ericspod I found out about something else which is quite interesting to me:

We do not only use this "using_cuda" here, but also in data_analyzer.py (something different to this)

        if self.device.type == "cuda":
        # release unreferenced tensors to mitigate OOM
        # limitation: https://github.com/pytorch/pytorch/issues/12873#issuecomment-482916237
        torch.cuda.empty_cache()

but it's not called "using_cuda" there, but only this question whether self.device.type == "cuda", but with similar consequences (empty_cache()). Does this have to be changed here as well? Thank you very much for your help!

I think the usages are unrelated, the DataAnalyzer class has a device member which it uses for different things like moving tensors to that device rather than working with the device a tensor had. Perhaps the usage isn't the best choice nor straight forward now that I'm looking at the code, but it works without issue as far as I know so it's something to possibly update later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants