Fixes #8697 GPU memory leak by checking both image and label tensors for CUDA device#8708
Fixes #8697 GPU memory leak by checking both image and label tensors for CUDA device#8708benediktjohannes wants to merge 9 commits intoProject-MONAI:devfrom
Conversation
…abel tensors for CUDA device Modified device detection to check BOTH image and label tensors torch.cuda.empty_cache() now called if EITHER tensor is on GPU Prevents GPU memory leaks in mixed device scenarios Signed-off-by: benediktjohannes <benedikt.johannes.hofer@gmail.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughPer-sample data access in monai/auto3dseg/analyzer.py was refactored to extract Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@monai/auto3dseg/analyzer.py`:
- Around line 471-476: The code uses undefined image_tensor and label_tensor
before they are assigned; extract them from the incoming data dict like
ImageStats and FgImageStats do (use the same keys those classes expect, e.g.,
"image" for the input image and "label" for the segmentation label), then
proceed with the existing logic (compute using_cuda, save restore_grad_state,
set_grad_enabled(False), build ndas from image_tensor and cast label_tensor to
int16 for ndas_label). Ensure the variables are set to the tensor objects (with
correct shape [B,...]) before the lines referencing using_cuda, ndas, and
ndas_label.
Signed-off-by: benediktjohannes <benedikt.johannes.hofer@gmail.com>
ericspod
left a comment
There was a problem hiding this comment.
Hi @benediktjohannes I think this is fine as it is now so we'll run tests again though we'll have to wait for us to resolve a CI issue. @mingxin-zheng was involved in this code so if he can double check for me that would be great.
|
Hi @ericspod thanks for the Message! I am (to be honest) also Not quite sure about this PR so please don‘t merge it as Long as Nobody Else is sure that this is Fine. Thanks! |
monai/auto3dseg/analyzer.py
Outdated
| using_cuda = False | ||
| image_tensor = d[self.image_key] | ||
| label_tensor = d[self.label_key] | ||
| using_cuda = any(isinstance(t, torch.Tensor) and t.device.type == "cuda" for t in (image_tensor, label_tensor)) |
There was a problem hiding this comment.
Thanks for the PR. MetaTensor could also use cuda device, so I believe we will need to check that as well
There was a problem hiding this comment.
Thanks for the answer! 👍
|
By the way: I'm not quite sure at the other edits of this PR as they were just recommended by coderabbit (and another important thing: I'm not quite sure whether I could have made a mistake when pasting these) and I'm also not quite sure in this case whether they should be merged or not, but I just relyed on the coderabbit review, so please correct me if there is a mistake and please also review this before merging, thanks! |
Signed-off-by: benediktjohannes <benedikt.johannes.hofer@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@monai/auto3dseg/analyzer.py`:
- Around line 471-478: Add parameterized unit tests that cover mixed-device
scenarios for LabelStats by creating cases where image_tensor is on CPU and
label_tensor is on CUDA and vice versa; instantiate batches similar to how ndas
and ndas_label are derived (use image_tensor, label_tensor and
MetaTensor/torch.Tensor on specific torch.device settings), call the LabelStats
code paths that consume ndas/ndas_label, and assert expected statistics and that
no device-related errors occur. Ensure tests toggle CUDA availability with
torch.cuda.is_available() guards, use explicit .to(device) on tensors, and
verify behavior matches the existing CPU-only and GPU-only assertions so
mixed-device handling is validated.
| image_tensor = d[self.image_key] | ||
| label_tensor = d[self.label_key] | ||
| using_cuda = any(isinstance(t, (torch.Tensor, MetaTensor)) and t.device.type == "cuda" for t in (image_tensor, label_tensor)) | ||
| restore_grad_state = torch.is_grad_enabled() | ||
| torch.set_grad_enabled(False) | ||
|
|
||
| ndas: list[MetaTensor] = [d[self.image_key][i] for i in range(d[self.image_key].shape[0])] # type: ignore | ||
| ndas_label: MetaTensor = d[self.label_key].astype(torch.int16) # (H,W,D) | ||
| ndas: list[MetaTensor] = [image_tensor[i] for i in range(image_tensor.shape[0])] # type: ignore | ||
| ndas_label: MetaTensor = label_tensor.astype(torch.int16) # (H,W,D) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
rg -n "LabelStats|FgImageStats|using_cuda|empty_cache|get_label_ccp" -g "*test*.py" -iRepository: Project-MONAI/MONAI
Length of output: 1005
🏁 Script executed:
# Also check the actual code at the specified lines
head -500 monai/auto3dseg/analyzer.py | tail -50Repository: Project-MONAI/MONAI
Length of output: 2358
🏁 Script executed:
# Find the LabelStats class and its __call__ method
rg -n "class LabelStats|def __call__" monai/auto3dseg/analyzer.pyRepository: Project-MONAI/MONAI
Length of output: 662
🏁 Script executed:
sed -n '319,360p' tests/apps/test_auto3dseg.pyRepository: Project-MONAI/MONAI
Length of output: 2553
🏁 Script executed:
sed -n '341,370p' tests/apps/test_auto3dseg.pyRepository: Project-MONAI/MONAI
Length of output: 1890
🏁 Script executed:
sed -n '420,500p' tests/apps/test_auto3dseg.pyRepository: Project-MONAI/MONAI
Length of output: 3971
🏁 Script executed:
head -50 tests/apps/test_auto3dseg.py | grep -E "device|import"Repository: Project-MONAI/MONAI
Length of output: 606
🏁 Script executed:
grep -n "^device\s*=" tests/apps/test_auto3dseg.pyRepository: Project-MONAI/MONAI
Length of output: 80
🏁 Script executed:
# Check the beginning of the test file for device setup
head -100 tests/apps/test_auto3dseg.pyRepository: Project-MONAI/MONAI
Length of output: 3483
🏁 Script executed:
grep -n "SIM_GPU_TEST_CASES\|@parameterized\|skip_if_no_cuda" tests/apps/test_auto3dseg.pyRepository: Project-MONAI/MONAI
Length of output: 359
🏁 Script executed:
# Check if there are GPU parametrized test methods
grep -B 5 -A 10 "test_label_stats.*gpu\|test_label_stats.*cuda" tests/apps/test_auto3dseg.pyRepository: Project-MONAI/MONAI
Length of output: 45
🏁 Script executed:
sed -n '214,280p' tests/apps/test_auto3dseg.pyRepository: Project-MONAI/MONAI
Length of output: 3116
🏁 Script executed:
# Check what the GPU test actually tests
grep -A 40 "@skip_if_no_cuda" tests/apps/test_auto3dseg.py | head -60Repository: Project-MONAI/MONAI
Length of output: 1974
Add tests for mixed-device LabelStats handling.
Tests currently cover CPU-only and GPU-only scenarios but not mixed (image on CPU + label on CUDA, or vice versa). Add parameterized test cases to verify LabelStats handles these mixed-device scenarios correctly per the coding guidelines.
🤖 Prompt for AI Agents
In `@monai/auto3dseg/analyzer.py` around lines 471 - 478, Add parameterized unit
tests that cover mixed-device scenarios for LabelStats by creating cases where
image_tensor is on CPU and label_tensor is on CUDA and vice versa; instantiate
batches similar to how ndas and ndas_label are derived (use image_tensor,
label_tensor and MetaTensor/torch.Tensor on specific torch.device settings),
call the LabelStats code paths that consume ndas/ndas_label, and assert expected
statistics and that no device-related errors occur. Ensure tests toggle CUDA
availability with torch.cuda.is_available() guards, use explicit .to(device) on
tensors, and verify behavior matches the existing CPU-only and GPU-only
assertions so mixed-device handling is validated.
|
Hi @benediktjohannes we have one isort issue to fix, I don't know why this came up. You can see what edits Coderabbit applied just through the usual "Files changed" section so there's no worries of strange changes. |
|
Hi, thank you very much for your answer @ericspod! Unforetunately, I don't really now what you meant with the "isort issue to fix"? So is something broken by my change? Because @CodeRabbit AI only suggests some tests? And I'm not quite sure about two things, I'd like to ask you, please: What do you think of the (kind of unrelated) change to because I guess that this was suggested by the AI and I'm not deep in this, so please let me know, what your opinion on this is, thanks! And one other question: Tests seem to fail because of some code style, but unforetunately, I don't really know at which point, so could you please assist me with this check because it only tells: black, 25.11.0 (compiled: yes) Oh no! 💥 💔 💥 without any information on which line and which code style thing is wrong. Thank you very much! |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 3484 🏁 Script executed: Length of output: 504 🏁 Script executed: Length of output: 43 🏁 Script executed: Length of output: 43 🏁 Script executed: Length of output: 1679 🏁 Script executed: Length of output: 1458 🏁 Script executed: Length of output: 149 🏁 Script executed: Length of output: 239 🏁 Script executed: Length of output: 3266 🏁 Script executed: Length of output: 3218 🏁 Script executed: Length of output: 2653
1. Black formatting issueLine 473 is 133 characters long, exceeding Black's 120-character limit. This line needs to be broken up: # Current (too long):
using_cuda = any(isinstance(t, (torch.Tensor, MetaTensor)) and t.device.type == "cuda" for t in (image_tensor, label_tensor))
# Fix - break into multiple lines:
using_cuda = any(
isinstance(t, (torch.Tensor, MetaTensor)) and t.device.type == "cuda"
for t in (image_tensor, label_tensor)
)2. MetaTensor([0.0]) vs torch.Tensor([0])This change improves consistency since the code already uses 3. Fixing the issuesYou can either:
The isort issue |
|
Thanks coderabbit! |
Signed-off-by: Benedikt Johannes <benedikt.johannes.hofer@gmail.com>
Head branch was pushed to by a user without write access
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
monai/auto3dseg/analyzer.py (1)
473-475:MetaTensoris redundant in theisinstancetuple.
MetaTensorinherits from bothMetaObjandtorch.Tensor, soisinstance(t, torch.Tensor)already matches anyMetaTensorinstance. The explicitMetaTensorin the tuple is dead weight.♻️ Proposed simplification
- isinstance(t, (torch.Tensor, MetaTensor)) and t.device.type == "cuda" + isinstance(t, torch.Tensor) and t.device.type == "cuda"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@monai/auto3dseg/analyzer.py` around lines 473 - 475, The isinstance check that builds using_cuda redundantly lists MetaTensor even though MetaTensor already subclasses torch.Tensor; update the generator expression that sets using_cuda (which iterates over image_tensor and label_tensor) to only check isinstance(t, torch.Tensor) (or include MetaObj if intentionally needed) and keep the rest of the condition (t.device.type == "cuda") unchanged so MetaTensor instances are still detected via torch.Tensor.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@monai/auto3dseg/analyzer.py`:
- Around line 473-476: The any(...) call assigned to using_cuda has its
generator expression aligned with the assignment which breaks Black; inside the
parentheses of the any call (in the expression using_cuda = any(...)) indent the
generator expression elements (the isinstance(...) check and the for t in
(image_tensor, label_tensor) clause) one additional level (4 spaces) so they sit
inside the parentheses, preserving normal multi-line call formatting for the
any(...) expression referencing torch.Tensor, MetaTensor, image_tensor and
label_tensor.
---
Nitpick comments:
In `@monai/auto3dseg/analyzer.py`:
- Around line 473-475: The isinstance check that builds using_cuda redundantly
lists MetaTensor even though MetaTensor already subclasses torch.Tensor; update
the generator expression that sets using_cuda (which iterates over image_tensor
and label_tensor) to only check isinstance(t, torch.Tensor) (or include MetaObj
if intentionally needed) and keep the rest of the condition (t.device.type ==
"cuda") unchanged so MetaTensor instances are still detected via torch.Tensor.
|
@ericspod I found out about something else which is quite interesting to me: We do not only use this "using_cuda" here, but also in but it's not called "using_cuda" there, but only this question whether self.device.type == "cuda", but with similar consequences (empty_cache()). Does this have to be changed here as well? Thank you very much for your help! |
Signed-off-by: Benedikt Johannes <benedikt.johannes.hofer@gmail.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
monai/auto3dseg/analyzer.py (1)
471-476: Simplify isinstance check —MetaTensorinherits fromtorch.Tensor.The
isinstance(t, (torch.Tensor, MetaTensor))check is redundant sinceMetaTensoris a subclass oftorch.Tensor. Useisinstance(t, torch.Tensor)instead.Suggested change
using_cuda = any( - isinstance(t, (torch.Tensor, MetaTensor)) and t.device.type == "cuda" + isinstance(t, torch.Tensor) and t.device.type == "cuda" for t in (image_tensor, label_tensor) )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@monai/auto3dseg/analyzer.py` around lines 471 - 476, The isinstance check used when computing using_cuda is redundant: replace the tuple isinstance(t, (torch.Tensor, MetaTensor)) with isinstance(t, torch.Tensor) in the comprehension that iterates over image_tensor and label_tensor (the variable using_cuda) so it correctly detects tensors without referencing MetaTensor explicitly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@monai/auto3dseg/analyzer.py`:
- Around line 471-481: Add a unit test that exercises LabelStats with mixed
device tensors (image on CPU and label on CUDA, and the reverse) to prevent
regressions: create batched MetaTensor/Tensor pairs where one is moved to "cuda"
and the other remains on "cpu", call the LabelStats-related entry point used in
analyzer.py (e.g., the LabelStats class or the method that consumes image_tensor
and label_tensor) and assert it completes without error and returns the expected
statistics; include both permutations (image CPU/label CUDA and image CUDA/label
CPU) so the code paths that check device types (using image_tensor,
label_tensor, and LabelStats) are covered.
---
Nitpick comments:
In `@monai/auto3dseg/analyzer.py`:
- Around line 471-476: The isinstance check used when computing using_cuda is
redundant: replace the tuple isinstance(t, (torch.Tensor, MetaTensor)) with
isinstance(t, torch.Tensor) in the comprehension that iterates over image_tensor
and label_tensor (the variable using_cuda) so it correctly detects tensors
without referencing MetaTensor explicitly.
|
Hmmm, @coderrabbit AI suggests that 471-476: Simplify isinstance check — MetaTensor inherits from torch.Tensor. The isinstance(t, (torch.Tensor, MetaTensor)) check is redundant since MetaTensor is a subclass of torch.Tensor. Use isinstance(t, torch.Tensor) instead. Suggested change
we should change this because MetaTensor doesn't need to be checked here, but @mingxin-zheng suggested that we should add MetaTensor, so I'll prefer to let MetaTensor be checked here as well, just for safety. 👍 |
I think it's actually only the code formatting complaint. You can run
I saw this earlier, it will use MetaTensor instances for later operations rather than just Tensor which may be needed for some reason. It's probably safe to do this though, but I'd think you'd only want to if you're getting Tensor instances somehow when expecting MetaTensor instances.
That's related to what I mentioned above, running the autofixes should resolve it. |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 18627 🏁 Script executed: Length of output: 72 🏁 Script executed: Length of output: 72
Yes, there's a similar but different issue there. In if self.device.type == "cuda":
torch.cuda.empty_cache()The problem: the code uses a local Fix for data_analyzer.py: if device.type == "cuda":
torch.cuda.empty_cache()However, this is a different scenario than your current PR:
Recommendation: Keep your current PR focused on the analyzer.py fix. The data_analyzer.py issue could be addressed separately. Would you like me to open an issue for tracking the data_analyzer.py fix? |
I think the usages are unrelated, the |
Modified device detection to check BOTH image and label tensors
torch.cuda.empty_cache() now called if EITHER tensor is on GPU
Prevents GPU memory leaks in mixed device scenarios