Skip to content

Docker Slim Image#8640

Merged
ericspod merged 34 commits intoProject-MONAI:devfrom
ericspod:docker_slim
Mar 6, 2026
Merged

Docker Slim Image#8640
ericspod merged 34 commits intoProject-MONAI:devfrom
ericspod:docker_slim

Conversation

@ericspod
Copy link
Member

@ericspod ericspod commented Nov 23, 2025

Description

This is an attempt to create a slim Docker image which is smaller than the current one to avoid running out of space during testing. Various fixes have been included to account for test fails within the image. These appear to be all real issues that need to be addressed (eg. ONNX export) or fixes that should be integrated either way.

This excludes PyTorch 2.9 from the requirements for now to avoid legacy issues with ONNX, Torchscript, and other things. MONAI needs to be updated for PyTorch 2.9 support, specifically dropping the use of Torchscript in places as it's becoming obsolete in place of torch.export.

Some tests fail without enough shared memory, the command I'm using to run with is docker run -ti --rm --gpus '"device=0,1"' --shm-size=10gb -v $(pwd)/tests:/opt/monai/tests monai_slim /bin/bash to tests with GPUs 0 and 1.

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
  • Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.
  • In-line docstrings updated.
  • Documentation updated, tested make html command in the docs/ folder.

ericspod and others added 13 commits July 16, 2025 18:03
Signed-off-by: Eric Kerfoot <eric.kerfoot@kcl.ac.uk>
Signed-off-by: Eric Kerfoot <17726042+ericspod@users.noreply.github.com>
Signed-off-by: Eric Kerfoot <eric.kerfoot@gmail.com>
…n the slim Docker image, all of which appear to be real errors.

Signed-off-by: Eric Kerfoot <eric.kerfoot@gmail.com>
…ply.github.com>

I, Eric Kerfoot <17726042+ericspod@users.noreply.github.com>, hereby add my Signed-off-by to this commit: 566c2bc

Signed-off-by: Eric Kerfoot <17726042+ericspod@users.noreply.github.com>
I, Eric Kerfoot <eric.kerfoot@kcl.ac.uk>, hereby add my Signed-off-by to this commit: 510987d

Signed-off-by: Eric Kerfoot <eric.kerfoot@kcl.ac.uk>
Signed-off-by: Eric Kerfoot <17726042+ericspod@users.noreply.github.com>
@Project-MONAI Project-MONAI deleted a comment from coderabbitai bot Nov 23, 2025
Signed-off-by: Eric Kerfoot <17726042+ericspod@users.noreply.github.com>
ericspod and others added 4 commits December 4, 2025 23:25
Signed-off-by: Eric Kerfoot <17726042+ericspod@users.noreply.github.com>
Signed-off-by: Eric Kerfoot <17726042+ericspod@users.noreply.github.com>
Signed-off-by: Eric Kerfoot <17726042+ericspod@users.noreply.github.com>
@ericspod
Copy link
Member Author

ericspod commented Dec 6, 2025

Nine tests in the image currently fail. The first 4 are related to auto3dseg and mention a value "image_stats" being missing from a config file, these tests pass when run in isolation however. The others relate to the GMM module and not being able to compile it since nvcc is missing from image, which is true since the CUDA toolkit is omitted for size reasons.

Output of the errors

======================================================================
ERROR: test_ensemble (tests.integration.test_auto3dseg_ensemble.TestEnsembleBuilder.test_ensemble)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/monai/monai/bundle/config_parser.py", line 158, in __getitem__
    look_up_option(k, config, print_all_options=False) if isinstance(config, dict) else config[int(k)]
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/monai/monai/utils/module.py", line 141, in look_up_option
    raise ValueError(f"Unsupported option '{opt_str}', " + supported_msg)
ValueError: Unsupported option 'image_stats', 

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/monai/tests/integration/test_auto3dseg_ensemble.py", line 155, in test_ensemble
    bundle_generator.generate(self.work_dir, num_fold=1)
  File "/opt/monai/monai/apps/auto3dseg/bundle_gen.py", line 660, in generate
    gen_algo.export_to_disk(output_folder, name, fold=f_id)
  File "/opt/monai/monai/apps/auto3dseg/bundle_gen.py", line 193, in export_to_disk
    self.fill_records = self.fill_template_config(self.data_stats_files, self.output_path, **kwargs)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/tmp324gd5iq/workdir/algorithm_templates/dints/scripts/algo.py", line 79, in fill_template_config
  File "/opt/monai/monai/bundle/config_parser.py", line 161, in __getitem__
    raise KeyError(f"query key: {k}") from e
KeyError: 'query key: image_stats'

======================================================================
ERROR: test_get_history (tests.integration.test_auto3dseg_hpo.TestHPO.test_get_history)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/monai/monai/bundle/config_parser.py", line 158, in __getitem__
    look_up_option(k, config, print_all_options=False) if isinstance(config, dict) else config[int(k)]
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/monai/monai/utils/module.py", line 141, in look_up_option
    raise ValueError(f"Unsupported option '{opt_str}', " + supported_msg)
ValueError: Unsupported option 'image_stats', 

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/monai/tests/integration/test_auto3dseg_hpo.py", line 129, in setUp
    bundle_generator.generate(work_dir, num_fold=1)
  File "/opt/monai/monai/apps/auto3dseg/bundle_gen.py", line 660, in generate
    gen_algo.export_to_disk(output_folder, name, fold=f_id)
  File "/opt/monai/monai/apps/auto3dseg/bundle_gen.py", line 193, in export_to_disk
    self.fill_records = self.fill_template_config(self.data_stats_files, self.output_path, **kwargs)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/tmp324gd5iq/workdir/algorithm_templates/dints/scripts/algo.py", line 79, in fill_template_config
  File "/opt/monai/monai/bundle/config_parser.py", line 161, in __getitem__
    raise KeyError(f"query key: {k}") from e
KeyError: 'query key: image_stats'

======================================================================
ERROR: test_run_algo (tests.integration.test_auto3dseg_hpo.TestHPO.test_run_algo)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/monai/monai/bundle/config_parser.py", line 158, in __getitem__
    look_up_option(k, config, print_all_options=False) if isinstance(config, dict) else config[int(k)]
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/monai/monai/utils/module.py", line 141, in look_up_option
    raise ValueError(f"Unsupported option '{opt_str}', " + supported_msg)
ValueError: Unsupported option 'image_stats', 

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/monai/tests/integration/test_auto3dseg_hpo.py", line 129, in setUp
    bundle_generator.generate(work_dir, num_fold=1)
  File "/opt/monai/monai/apps/auto3dseg/bundle_gen.py", line 660, in generate
    gen_algo.export_to_disk(output_folder, name, fold=f_id)
  File "/opt/monai/monai/apps/auto3dseg/bundle_gen.py", line 193, in export_to_disk
    self.fill_records = self.fill_template_config(self.data_stats_files, self.output_path, **kwargs)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/tmp324gd5iq/workdir/algorithm_templates/dints/scripts/algo.py", line 79, in fill_template_config
  File "/opt/monai/monai/bundle/config_parser.py", line 161, in __getitem__
    raise KeyError(f"query key: {k}") from e
KeyError: 'query key: image_stats'

======================================================================
ERROR: test_run_optuna (tests.integration.test_auto3dseg_hpo.TestHPO.test_run_optuna)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/monai/monai/bundle/config_parser.py", line 158, in __getitem__
    look_up_option(k, config, print_all_options=False) if isinstance(config, dict) else config[int(k)]
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/monai/monai/utils/module.py", line 141, in look_up_option
    raise ValueError(f"Unsupported option '{opt_str}', " + supported_msg)
ValueError: Unsupported option 'image_stats', 

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/monai/tests/integration/test_auto3dseg_hpo.py", line 129, in setUp
    bundle_generator.generate(work_dir, num_fold=1)
  File "/opt/monai/monai/apps/auto3dseg/bundle_gen.py", line 660, in generate
    gen_algo.export_to_disk(output_folder, name, fold=f_id)
  File "/opt/monai/monai/apps/auto3dseg/bundle_gen.py", line 193, in export_to_disk
    self.fill_records = self.fill_template_config(self.data_stats_files, self.output_path, **kwargs)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/tmp324gd5iq/workdir/algorithm_templates/dints/scripts/algo.py", line 79, in fill_template_config
  File "/opt/monai/monai/bundle/config_parser.py", line 161, in __getitem__
    raise KeyError(f"query key: {k}") from e
KeyError: 'query key: image_stats'

======================================================================
ERROR: test_cuda_0_2_batches_1_dimensions_1_channels_2_classes_2_mixtures (tests.networks.layers.test_gmm.GMMTestCase.test_cuda_0_2_batches_1_dimensions_1_channels_2_classes_2_mixtures)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2595, in _run_ninja_build
    subprocess.run(
  File "/usr/lib/python3.11/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 127.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/parameterized/parameterized.py", line 620, in standalone_func
    return func(*(a + p.args), **p.kwargs, **kw)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/monai/tests/networks/layers/test_gmm.py", line 287, in test_cuda
    gmm = GaussianMixtureModel(features_tensor.size(1), mixture_count, class_count, verbose_build=True)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/monai/monai/networks/layers/gmm.py", line 44, in __init__
    self.compiled_extension = load_module(
                              ^^^^^^^^^^^^
  File "/opt/monai/monai/_extensions/loader.py", line 89, in load_module
    module = load(
             ^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 1681, in load
    return _jit_compile(
           ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2138, in _jit_compile
    _write_ninja_file_and_build_library(
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2290, in _write_ninja_file_and_build_library
    _run_ninja_build(
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2612, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'gmm_1_2_1_Linux_3_11_2_28_12_8'

======================================================================
ERROR: test_cuda_1_1_batches_1_dimensions_5_channels_2_classes_1_mixtures (tests.networks.layers.test_gmm.GMMTestCase.test_cuda_1_1_batches_1_dimensions_5_channels_2_classes_1_mixtures)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2595, in _run_ninja_build
    subprocess.run(
  File "/usr/lib/python3.11/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 127.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/parameterized/parameterized.py", line 620, in standalone_func
    return func(*(a + p.args), **p.kwargs, **kw)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/monai/tests/networks/layers/test_gmm.py", line 287, in test_cuda
    gmm = GaussianMixtureModel(features_tensor.size(1), mixture_count, class_count, verbose_build=True)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/monai/monai/networks/layers/gmm.py", line 44, in __init__
    self.compiled_extension = load_module(
                              ^^^^^^^^^^^^
  File "/opt/monai/monai/_extensions/loader.py", line 89, in load_module
    module = load(
             ^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 1681, in load
    return _jit_compile(
           ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2138, in _jit_compile
    _write_ninja_file_and_build_library(
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2290, in _write_ninja_file_and_build_library
    _run_ninja_build(
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2612, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'gmm_5_2_1_Linux_3_11_2_28_12_8'

======================================================================
ERROR: test_cuda_2_1_batches_2_dimensions_2_channels_4_classes_4_mixtures (tests.networks.layers.test_gmm.GMMTestCase.test_cuda_2_1_batches_2_dimensions_2_channels_4_classes_4_mixtures)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2595, in _run_ninja_build
    subprocess.run(
  File "/usr/lib/python3.11/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 127.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/parameterized/parameterized.py", line 620, in standalone_func
    return func(*(a + p.args), **p.kwargs, **kw)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/monai/tests/networks/layers/test_gmm.py", line 287, in test_cuda
    gmm = GaussianMixtureModel(features_tensor.size(1), mixture_count, class_count, verbose_build=True)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/monai/monai/networks/layers/gmm.py", line 44, in __init__
    self.compiled_extension = load_module(
                              ^^^^^^^^^^^^
  File "/opt/monai/monai/_extensions/loader.py", line 89, in load_module
    module = load(
             ^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 1681, in load
    return _jit_compile(
           ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2138, in _jit_compile
    _write_ninja_file_and_build_library(
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2290, in _write_ninja_file_and_build_library
    _run_ninja_build(
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2612, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'gmm_2_4_1_Linux_3_11_2_28_12_8'

======================================================================
ERROR: test_cuda_3_1_batches_3_dimensions_1_channels_2_classes_1_mixtures (tests.networks.layers.test_gmm.GMMTestCase.test_cuda_3_1_batches_3_dimensions_1_channels_2_classes_1_mixtures)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2595, in _run_ninja_build
    subprocess.run(
  File "/usr/lib/python3.11/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 127.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/parameterized/parameterized.py", line 620, in standalone_func
    return func(*(a + p.args), **p.kwargs, **kw)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/monai/tests/networks/layers/test_gmm.py", line 287, in test_cuda
    gmm = GaussianMixtureModel(features_tensor.size(1), mixture_count, class_count, verbose_build=True)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/monai/monai/networks/layers/gmm.py", line 44, in __init__
    self.compiled_extension = load_module(
                              ^^^^^^^^^^^^
  File "/opt/monai/monai/_extensions/loader.py", line 89, in load_module
    module = load(
             ^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 1681, in load
    return _jit_compile(
           ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2138, in _jit_compile
    _write_ninja_file_and_build_library(
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2290, in _write_ninja_file_and_build_library
    _run_ninja_build(
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2612, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'gmm_1_2_1_Linux_3_11_2_28_12_8_v1'

======================================================================
ERROR: test_load (tests.networks.layers.test_gmm.GMMTestCase.test_load)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2595, in _run_ninja_build
    subprocess.run(
  File "/usr/lib/python3.11/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 127.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/monai/tests/networks/layers/test_gmm.py", line 310, in test_load
    load_module("gmm", {"CHANNEL_COUNT": 2, "MIXTURE_COUNT": 2, "MIXTURE_SIZE": 3}, verbose_build=True)
  File "/opt/monai/monai/_extensions/loader.py", line 89, in load_module
    module = load(
             ^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 1681, in load
    return _jit_compile(
           ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2138, in _jit_compile
    _write_ninja_file_and_build_library(
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2290, in _write_ninja_file_and_build_library
    _run_ninja_build(
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2612, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'gmm_2_2_3_Linux_3_11_2_28_12_8'

It's a simple matter of forcing the GMM module to build when building the image, this also fails if used as a RUN command: python -c 'from monai._extensions import load_module;load_module("gmm", {"CHANNEL_COUNT": 2, "MIXTURE_COUNT": 2, "MIXTURE_SIZE": 3}, verbose_build=True)'

Signed-off-by: Eric Kerfoot <17726042+ericspod@users.noreply.github.com>
Signed-off-by: Eric Kerfoot <17726042+ericspod@users.noreply.github.com>
@ericspod ericspod marked this pull request as ready for review December 8, 2025 11:53
@Project-MONAI Project-MONAI deleted a comment from coderabbitai bot Dec 8, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
.github/workflows/pythonapp.yml (1)

31-37: Consolidate repetitive cleanup logic.

The "Clean unused tools" step is identical across three jobs. Consider using a GitHub Actions composite action or reusable workflow to avoid duplication. Additionally, verify whether the duplicate cleanup commands in the subsequent "Install dependencies" steps (e.g., line 56, line 169) are still necessary now that cleanup runs upfront.

Also applies to: 140-146, 232-238

Dockerfile.slim (1)

88-88: CUDA_HOME set in non-CUDA runtime image.

Line 88 sets ENV CUDA_HOME=/usr/local/cuda, but the CUDA toolkit is not present in the final image (intentionally omitted for size). While MONAI can degrade gracefully, this may cause confusing behavior or spurious build attempts. Consider removing this env var or documenting why it's needed.

requirements.txt (1)

1-2: Add inline comments explaining version constraints.

The <2.9 upper bound (ONNX/TorchScript incompatibility) and Windows !=2.7.0 exclusion (known CUDA/XPU wheel issues) should be documented with comments to help future maintainers understand these constraints.

tests/networks/test_convert_to_onnx.py (1)

73-104: Fix minor docstring typo in SegResNet test

Docstring says “SetResNet” while the model is SegResNet. Consider aligning the wording.

-        """Test converting SetResNet to ONNX."""
+        """Test converting SegResNet to ONNX."""
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 15fd428 and 39ecb09.

📒 Files selected for processing (12)
  • .dockerignore (1 hunks)
  • .github/workflows/pythonapp.yml (3 hunks)
  • Dockerfile.slim (1 hunks)
  • monai/apps/vista3d/inferer.py (1 hunks)
  • monai/networks/nets/vista3d.py (1 hunks)
  • monai/networks/utils.py (2 hunks)
  • requirements-dev.txt (2 hunks)
  • requirements.txt (1 hunks)
  • tests/bundle/test_bundle_download.py (2 hunks)
  • tests/data/meta_tensor/test_meta_tensor.py (1 hunks)
  • tests/losses/test_multi_scale.py (1 hunks)
  • tests/networks/test_convert_to_onnx.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

Review the Python code for quality and correctness. Ensure variable names adhere to PEP8 style guides, are sensible and informative in regards to their function, though permitting simple names for loop and comprehension variables. Ensure routine names are meaningful in regards to their function and use verbs, adjectives, and nouns in a semantically appropriate way. Docstrings should be present for all definition which describe each variable, return value, and raised exception in the appropriate section of the Google-style of docstrings. Examine code for logical error or inconsistencies, and suggest what may be changed to addressed these. Suggest any enhancements for code improving efficiency, maintainability, comprehensibility, and correctness. Ensure new or modified definitions will be covered by existing or new unit tests.

Files:

  • tests/losses/test_multi_scale.py
  • monai/networks/nets/vista3d.py
  • tests/data/meta_tensor/test_meta_tensor.py
  • monai/apps/vista3d/inferer.py
  • monai/networks/utils.py
  • tests/bundle/test_bundle_download.py
  • tests/networks/test_convert_to_onnx.py
🧬 Code graph analysis (3)
tests/losses/test_multi_scale.py (1)
tests/test_utils.py (1)
  • assert_allclose (119-159)
monai/networks/nets/vista3d.py (1)
monai/data/meta_tensor.py (1)
  • astype (434-461)
tests/data/meta_tensor/test_meta_tensor.py (2)
monai/bundle/scripts.py (1)
  • load (630-770)
monai/networks/nets/transchex.py (1)
  • load (96-103)
🔇 Additional comments (15)
tests/bundle/test_bundle_download.py (2)

18-18: Importing skipIf alongside skipUnless is appropriate

Matches existing usage pattern and keeps decorators local; no issues here.


222-222: NGC private test skip condition is correct

Skipping when NGC_API_KEY is not set is a precise guard for this private-source test and avoids spurious failures in environments without credentials.

tests/data/meta_tensor/test_meta_tensor.py (1)

248-248: Verify that production code loading MetaTensor objects accounts for custom class requirements.

The weights_only=False parameter at line 248 is necessary for MetaTensor to preserve its metadata and custom attributes during deserialization. However, ensure that any production code paths deserializing MetaTensor objects also use weights_only=False, as weights_only=True restricts loading to basic types and will fail for custom classes. For untrusted external model files, consider using safer formats like safetensors or ensure PyTorch ≥ 2.6.0 is used to mitigate known deserialization vulnerabilities.

.dockerignore (1)

6-10: LGTM.

The additions align with Docker build context optimization and complement the Dockerfile.slim copying strategy.

Dockerfile.slim (3)

56-56: Confirm extension build succeeds despite known GMM failures.

Per PR notes, GMM extension build fails in the slim image due to missing CUDA compiler. This build command (line 56) succeeds in the build stage, but GMM tests fail later when run inside the slim image. Clarify whether this is expected behavior or whether the slim image is intended to support GMM-dependent tests.

If the slim image is not meant to support GMM tests, document this limitation or skip GMM tests conditionally in the image.


16-89: Multi-stage build is well-designed for size optimization.

The separation of build (with CUDA) and runtime (without CUDA) stages correctly achieves the slim image goal. Artifact cleanup (line 75-76) appropriately reduces bloat.


38-42: NGC CLI validation is thorough; verify it's used.

The NGC CLI is downloaded and validated via md5sum in the build, but there's no evidence it's invoked in MONAI workflows. If unused, consider removing to reduce image bloat. Verify NGC CLI is actually used by searching the codebase for invocations (e.g., rg -i 'ngc\s+' in monai/ and tests/ directories).

tests/losses/test_multi_scale.py (1)

58-58: Justify the tolerance relaxation.

Loosening rtol from 1e-5 to 1e-4 may mask numerical regressions. Add a comment explaining the reason (e.g., slim image, different CUDA runtime).

monai/apps/vista3d/inferer.py (1)

89-95: LGTM!

Using tuple for immutable slice specification is idiomatic.

requirements-dev.txt (2)

3-3: Unpinning pytorch-ignite may introduce breaking changes.

Consider adding a minimum version constraint if specific features are relied upon.


55-55: LGTM!

Making onnxruntime unconditional is appropriate since it now supports Python 3.11+.

monai/networks/nets/vista3d.py (1)

246-249: LGTM!

Explicit int() cast avoids NumPy bool typing issues. Logic is clearer with intermediate variables.

monai/networks/utils.py (1)

714-745: ONNX export target variable rename is correct and consistent

model_to_export is now set appropriately in both the trace and script branches and passed to torch.onnx.export, resolving the previous naming inconsistency without changing external behavior. Looks good.

tests/networks/test_convert_to_onnx.py (2)

25-32: CPU‑only device options and ONNX import handling are reasonable

Using onnx, _ = optional_import("onnx") together with @SkipIfNoModule("onnx") is safe, and constraining TORCH_DEVICE_OPTIONS to ["cpu"] with a clear FIXME matches the current nondeterministic CUDA behavior and slim-image constraints.


48-70: UNet ONNX export test now robustly exercises both trace and script paths

The refactored test_unet always calls convert_to_onnx, parameterizes use_trace and use_ort, and validates the return type against onnx.ModelProto, giving good coverage of the updated convert_to_onnx behavior on CPU.

@Project-MONAI Project-MONAI deleted a comment from coderabbitai bot Dec 8, 2025
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Eric Kerfoot <17726042+ericspod@users.noreply.github.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 8, 2025

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds/updates CI and build infra: expanded .dockerignore and a new multi-stage Dockerfile.slim for CUDA builds with NGC CLI and layered runtime. Code changes: unravel_slice built as a tuple; connected_components_combine now appends int(np.any(...)) instead of bool; convert_to_onnx uses model_to_export consistently and adjusts verify handling when tracing with dynamo. Dependency change: unpinned pytorch-ignite. Tests: gate NGC test with skipIf, load MetaTensor with weights_only=False, relax numeric tolerance, and simplify ONNX conversion tests to CPU paths and always run export.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 62.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive Title 'Docker Slim Image' is vague and doesn't convey meaningful information about the specific changes made (fixes for ONNX export, test tolerance adjustments, and various other modifications beyond just the Docker image). Revise title to be more descriptive, e.g., 'Create slim Docker image with ONNX and test fixes' or similar to clarify the scope.
✅ Passed checks (1 passed)
Check name Status Explanation
Description check ✅ Passed Description covers main objectives and provides context about the slim image and test issues, but the checklist is incomplete: docstring updates and documentation checks were not performed.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: Eric Kerfoot <17726042+ericspod@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (4)
tests/losses/test_multi_scale.py (1)

57-58: Relaxed tolerance is reasonable for numeric stability

Bumping rtol to 1e-4 here is still strict for a scalar loss value and should help avoid backend‑dependent noise causing spurious failures. No functional concerns from this change alone. You might optionally consider using the shared assert_allclose helper from tests.test_utils for consistency across tests, but it’s not required for correctness here.

.github/workflows/pythonapp.yml (1)

31-37: Cleanup step placement is ineffectual; consider moving or consolidating.

Running this cleanup before checkout provides minimal value since no artifacts exist yet. Cleanup is most useful after build steps produce large intermediate files. Additionally, the same 5-line block repeats across three jobs—consider extracting to a reusable workflow or composite action to reduce duplication.

Also applies to: 140-146, 232-238

Dockerfile.slim (2)

90-90: BUILD_MONAI flag is build-time only; unnecessary in runtime image.

Line 90 sets BUILD_MONAI=1 in the final image, but this flag is only meaningful during setup.py develop (line 56 of the build stage). Carrying it into the runtime image serves no purpose and may confuse future maintainers.

Apply this diff to remove the unnecessary flag:

 ENV PATH=${PATH}:/opt/tools:/opt/tools/ngc-cli
 ENV POLYGRAPHY_AUTOINSTALL_DEPS=1
-ENV BUILD_MONAI=1
 ENV CUDA_HOME=/usr/local/cuda

76-77: Restrict pycache cleanup to MONAI-specific paths for efficiency.

Line 77 uses find / to remove all __pycache__ directories, which is inefficient and targets too broadly (system libraries, /proc, /sys, etc.). Scope the cleanup to MONAI artifacts and Python site-packages.

Apply this diff:

 RUN rm -rf /opt/monai/build /opt/monai/monai.egg-info && \
-    find / -name __pycache__ | xargs rm -rf
+    find /opt/monai /usr/local/lib -name __pycache__ -type d -exec rm -rf {} + 2>/dev/null || true
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 15fd428 and ee7afe8.

📒 Files selected for processing (12)
  • .dockerignore (1 hunks)
  • .github/workflows/pythonapp.yml (4 hunks)
  • Dockerfile.slim (1 hunks)
  • monai/apps/vista3d/inferer.py (1 hunks)
  • monai/networks/nets/vista3d.py (1 hunks)
  • monai/networks/utils.py (2 hunks)
  • requirements-dev.txt (2 hunks)
  • requirements.txt (1 hunks)
  • tests/bundle/test_bundle_download.py (2 hunks)
  • tests/data/meta_tensor/test_meta_tensor.py (1 hunks)
  • tests/losses/test_multi_scale.py (1 hunks)
  • tests/networks/test_convert_to_onnx.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

Review the Python code for quality and correctness. Ensure variable names adhere to PEP8 style guides, are sensible and informative in regards to their function, though permitting simple names for loop and comprehension variables. Ensure routine names are meaningful in regards to their function and use verbs, adjectives, and nouns in a semantically appropriate way. Docstrings should be present for all definition which describe each variable, return value, and raised exception in the appropriate section of the Google-style of docstrings. Examine code for logical error or inconsistencies, and suggest what may be changed to addressed these. Suggest any enhancements for code improving efficiency, maintainability, comprehensibility, and correctness. Ensure new or modified definitions will be covered by existing or new unit tests.

Files:

  • monai/networks/nets/vista3d.py
  • monai/networks/utils.py
  • tests/networks/test_convert_to_onnx.py
  • tests/data/meta_tensor/test_meta_tensor.py
  • tests/bundle/test_bundle_download.py
  • tests/losses/test_multi_scale.py
  • monai/apps/vista3d/inferer.py
🧬 Code graph analysis (2)
tests/networks/test_convert_to_onnx.py (3)
monai/utils/module.py (1)
  • optional_import (315-445)
monai/networks/nets/unet.py (1)
  • UNet (28-299)
monai/networks/utils.py (1)
  • convert_to_onnx (661-785)
tests/losses/test_multi_scale.py (1)
tests/test_utils.py (1)
  • assert_allclose (119-159)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: packaging
  • GitHub Check: quick-py3 (macOS-latest)
  • GitHub Check: quick-py3 (windows-latest)
  • GitHub Check: quick-py3 (ubuntu-latest)
  • GitHub Check: flake8-py3 (pytype)
  • GitHub Check: min-dep-os (macOS-latest)
🔇 Additional comments (14)
tests/bundle/test_bundle_download.py (2)

18-18: Importing skipIf is appropriate and consistent

Adding skipIf here is correct for the new test gating and keeps skip-related utilities together with skipUnless. No issues.


220-223: Environment‑gated NGC private test looks correct

Skipping test_ngc_private_source_download_bundle when NGC_API_KEY is unset is a sensible way to avoid hard failures in environments without credentials, while still running the test when properly configured.

monai/networks/utils.py (1)

716-736: LGTM - Typo fix in variable naming.

Variable renamed from mode_to_export to model_to_export in both tracing and scripting branches. Semantically correct and consistent.

monai/apps/vista3d/inferer.py (1)

89-95: LGTM - List to tuple for slice construction.

Changed from list to tuple for the slice indexing object. Both work equivalently for tensor indexing; tuple is more idiomatic for immutable sequences.

.dockerignore (1)

6-10: LGTM - Added development artifact ignores.

Good practice to exclude .vscode, .git, and various tool caches from Docker builds. Reduces image size.

monai/networks/nets/vista3d.py (1)

246-249: LGTM - Explicit type conversion for numpy compatibility.

Refactored to build per-point boolean list, apply np.any(), and explicitly cast to int. Comment clarifies this avoids numpy typing issues. Functionally equivalent to previous logic.

tests/networks/test_convert_to_onnx.py (3)

25-32: LGTM - Simplified to CPU-only ONNX testing.

Removed CUDA device options, restricting tests to CPU. Aligns with slim Docker image without CUDA toolkit. Comment at line 29-31 notes CUDA produces different outputs during testing.


48-69: LGTM - Added docstring and type assertion.

Added docstring for test clarity. Now always calls convert_to_onnx and asserts return type is onnx.ModelProto, improving test coverage.


73-104: LGTM - Added docstring for SegResNet test.

Documents that this tests SegResNet ONNX conversion with use_trace=True. Existing tolerance and runtime checks preserved.

tests/data/meta_tensor/test_meta_tensor.py (1)

248-248: Verify necessity of weights_only=False change.

Changed from weights_only=True to weights_only=False when loading the pickled MetaTensor. This allows loading arbitrary Python objects, which is less secure. Ensure this is required for proper MetaTensor deserialization with metadata.

requirements.txt (1)

1-2: Reconsider Windows exclusion of PyTorch 2.7.0 — documented issues are environmental, not version-critical.

PyTorch 2.7.0 (latest stable as of December 2025) has known Windows complications: CUDA/toolkit mismatches (especially CUDA 12.x builds), XPU wheel failures, and dependency conflicts with torchvision/torchaudio. However, these are configuration and environment issues, not version-specific breakages. Excluding 2.7.0 entirely on Windows prevents users from installing the latest stable release. Instead, document the required environment setup (CUDA version matching, wheel source verification, system DLL/Windows 10+ requirement) or keep 2.7.0 available if your tested environment passes.

The upper bound <2.9 is appropriate; MONAI 1.5.x supports PyTorch up to 2.8, and PyTorch 2.9 is not yet released.

Dockerfile.slim (3)

72-73: ✓ Python version handling addressed prior feedback.

The use of ARG PYTHON_VERSION=3.11 with variable substitution in the COPY command (line 73) makes the build maintainable and configurable across Python versions. This directly addresses the prior review concern about hardcoded paths.


38-42: Verify NGC CLI MD5 validation file exists in build context.

Line 41 attempts to validate the NGC CLI download using ngc-cli.md5, but this file must be available in the build context when the RUN command executes. Confirm that ngc-cli.md5 is either:

  • Present in the repository root, or
  • Copied into the Docker image before line 41 via a COPY or ADD statement

89-89: CUDA_HOME points to non-existent directory in runtime image.

Line 89 sets CUDA_HOME=/usr/local/cuda, but the build2 stage (lines 64–68) explicitly does not install the CUDA toolkit. This means the directory doesn't exist at runtime, which could cause failures if code or precompiled extensions try to resolve CUDA libraries or executables.

Either remove this environment variable from the runtime image or install minimal CUDA runtime libraries in the build2 stage if they're required for the precompiled extensions.

@ericspod
Copy link
Member Author

ericspod commented Dec 8, 2025

Hi @KumoLiu I think we should push this one through to solve some of the issues we're seeing with actions running out of space. Some other PRs are stuck with failing checks but I think the changes I have here will fix them.

Signed-off-by: Eric Kerfoot <17726042+ericspod@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/networks/test_convert_to_onnx.py (1)

29-31: FIXME left without a tracking issue.

The CUDA path is silently disabled with no linked issue. This means CUDA ONNX export goes untested indefinitely.

Want me to draft a GitHub issue to track the CUDA vs. ONNX output discrepancy so this isn't forgotten?

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/networks/test_convert_to_onnx.py` around lines 29 - 31, Replace the
naked FIXME comment that disables CUDA in the test with an explicit tracking
reference: open a GitHub issue describing the CUDA vs ONNX output discrepancy,
then update the commented block around TORCH_DEVICE_OPTIONS (the conditional
that would append "cuda") to include the issue number/URL and a short TODO note,
or alternatively add a pytest.skip/marker linked to that issue; ensure the
comment clearly states the issue ID so the CUDA path isn't silently untested.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/networks/test_convert_to_onnx.py`:
- Line 73: The docstring contains a typo: replace the text "SetResNet" with
"SegResNet" in the test docstring that currently reads "Test converting
SetResNet to ONNX." so the string becomes "Test converting SegResNet to ONNX.";
update the triple-quoted docstring in the test function (the one with that exact
text) to correct the model name.

---

Nitpick comments:
In `@tests/networks/test_convert_to_onnx.py`:
- Around line 29-31: Replace the naked FIXME comment that disables CUDA in the
test with an explicit tracking reference: open a GitHub issue describing the
CUDA vs ONNX output discrepancy, then update the commented block around
TORCH_DEVICE_OPTIONS (the conditional that would append "cuda") to include the
issue number/URL and a short TODO note, or alternatively add a
pytest.skip/marker linked to that issue; ensure the comment clearly states the
issue ID so the CUDA path isn't silently untested.

Signed-off-by: Eric Kerfoot <17726042+ericspod@users.noreply.github.com>
Signed-off-by: Eric Kerfoot <17726042+ericspod@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@requirements-dev.txt`:
- Around line 52-54: The requirements list contains a duplicate onnxruntime
entry that nullifies the python_version guard: remove the unconditional
"onnxruntime" line and keep only the guarded entry 'onnxruntime; python_version
< "3.12"' so the package is installed conditionally (preserving the onnxscript
line as-is); ensure no other duplicate onnxruntime entries remain in
requirements-dev.txt.

---

Duplicate comments:
In `@requirements-dev.txt`:
- Line 3: Unpinning pytorch-ignite can break code that imports ignite.contrib;
inspect the codebase (and MONAI-related modules) for any imports of
ignite.contrib and either remove/replace those imports or pin pytorch-ignite
back to a pre-0.5.0 release (e.g., <=0.4.x) in requirements-dev.txt;
specifically search for "ignite.contrib" and references to MONAI integration
points and update the dependency or the import usage accordingly before merging.

Copy link
Contributor

@garciadias garciadias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent, Eric. I am still running the tests. I will let you know when it is done and provide some feedback on that as well.

Thank you.

@garciadias
Copy link
Contributor

Is this supposed to be a temporary image to then replace the current Docker image, or will we maintain both?

If we are keeping both, it may worth it adding some documentation around here

ericspod and others added 2 commits March 6, 2026 20:19
Signed-off-by: Eric Kerfoot <17726042+ericspod@users.noreply.github.com>
@ericspod
Copy link
Member Author

ericspod commented Mar 6, 2026

Is this supposed to be a temporary image to then replace the current Docker image, or will we maintain both?

If we are keeping both, it may worth it adding some documentation around here

It's meant to be kept as a slimmed image for anyone that wants it, we had originally thought about replacing the existing image which was encountering memory issues, but this was solved in the Actions themselves by caching in a different way. There wasn't really documentation on this and it made more sense to add something to the README.md file which I have now done.

ericspod added 2 commits March 6, 2026 20:27
Signed-off-by: Eric Kerfoot <17726042+ericspod@users.noreply.github.com>
@garciadias garciadias self-requested a review March 6, 2026 20:32
@ericspod ericspod merged commit daaedaa into Project-MONAI:dev Mar 6, 2026
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants