Skip to content

HDDS-15300. Handle zero-capacity volumes in DiskBalancer policy.#10294

Open
slfan1989 wants to merge 2 commits into
apache:masterfrom
slfan1989:HDDS-15300
Open

HDDS-15300. Handle zero-capacity volumes in DiskBalancer policy.#10294
slfan1989 wants to merge 2 commits into
apache:masterfrom
slfan1989:HDDS-15300

Conversation

@slfan1989
Copy link
Copy Markdown
Contributor

@slfan1989 slfan1989 commented May 17, 2026

What changes were proposed in this pull request?

This PR makes DiskBalancer skip zero-capacity volumes during volume candidate selection.

DefaultContainerChoosingPolicy sorts volumes by VolumeFixedUsage#getUtilization. However, VolumeFixedUsage stores utilization as null when the volume capacity is 0. If such a volume is included in the volume set, sorting can trigger a NullPointerException before DiskBalancer chooses source/destination volumes.

This change filters out volumes whose capacity is less than or equal to 0 before sorting by utilization. If fewer than two usable volumes remain after filtering, the policy returns no candidate instead of failing.

Detailed analysis of the execution process

  1. When the volume capacity is 0, VolumeFixedUsage sets utilization to null.

private VolumeFixedUsage(HddsVolume volume, long delta) {
this.volume = volume;
this.usage = volume.getCurrentUsage();
this.effectiveUsed = computeEffectiveUsage(usage, volume.getCommittedBytes(), delta);
this.utilization = usage.getCapacity() > 0 ? computeUtilization(usage, volume.getCommittedBytes(), delta) : null;

This means:

  • if capacity > 0, utilization is calculated;
  • if capacity == 0, utilization is not calculated and remains null.
  1. However, getUtilization() does not allow null.

public double getUtilization() {
return Objects.requireNonNull(utilization, "utilization == null");
}

So if utilization is null, calling this method throws:

NullPointerException: utilization == null
  1. DefaultContainerChoosingPolicy calls getUtilization() while sorting volumes.

final List<VolumeFixedUsage> volumeUsages = allVolumes.stream()
.map(v -> newVolumeFixedUsage(v, deltaMap))
.sorted(Comparator.comparingDouble(VolumeFixedUsage::getUtilization)
.thenComparing(v -> v.getVolume().getStorageID()))
.collect(Collectors.toList());

During sorting, Java calls getUtilization() for each VolumeFixedUsage. If any volume has capacity == 0, its utilization is null, and sorting fails with an NPE before DiskBalancer can choose source/destination volumes.

What is the link to the Apache JIRA

JIRA: HDDS-15300. Handle zero-capacity volumes in DiskBalancer policy.

How was this patch tested?

Add Junit Test

CI: https://github.com/slfan1989/ozone/actions/runs/25983890878

@adoroszlai adoroszlai requested a review from Gargi-jais11 May 17, 2026 08:58
Copy link
Copy Markdown
Contributor

@Gargi-jais11 Gargi-jais11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @slfan1989 for the fix.

Please do changes for calculateVolumeDataDensity — zero-capacity volumes

Filtering to getUsage().getCapacity() > 0 before calling getIdealUsage / getUtilization() is the right fix: VolumeFixedUsage leaves utilization == null when capacity is 0, so the previous loop could only “handle” that path by throwing and falling through to catch, which turned a common edge case into -1.0 and hid the real issue.

  public static double calculateVolumeDataDensity(List<VolumeFixedUsage> volumeSet) {
    if (volumeSet == null) {
      LOG.warn("VolumeSet is null, returning 0.0 for VolumeDataDensity");
      return 0.0;
    }

    try {
      final List<VolumeFixedUsage> usableVolumes = volumeSet.stream()
          .filter(v -> v.getUsage().getCapacity() > 0)
          .collect(Collectors.toList());

      if (usableVolumes.size() <= 1) {
        return 0.0;
      }

      // Calculate ideal usage using the same immutable volume snapshot
      final double idealUsage = getIdealUsage(usableVolumes);
      double volumeDensitySum = 0.0;

      // Calculate density for each volume using the same snapshot
      for (VolumeFixedUsage volumeUsage : usableVolumes) {
        final double currentUsage = volumeUsage.getUtilization();

        // Calculate density as absolute difference from ideal usage
        double volumeDensity = Math.abs(currentUsage - idealUsage);
        volumeDensitySum += volumeDensity;
      }
      return volumeDensitySum;
    } catch (Exception e) {
      LOG.error("Error calculating VolumeDataDensity", e);
      return -1.0;
    }
  }

With this change everywhere for diskbalancer if a volume is of 0 capacity that will be excluded.

@slfan1989
Copy link
Copy Markdown
Contributor Author

Thanks @slfan1989 for the fix.

Please do changes for calculateVolumeDataDensity — zero-capacity volumes

Filtering to getUsage().getCapacity() > 0 before calling getIdealUsage / getUtilization() is the right fix: VolumeFixedUsage leaves utilization == null when capacity is 0, so the previous loop could only “handle” that path by throwing and falling through to catch, which turned a common edge case into -1.0 and hid the real issue.

  public static double calculateVolumeDataDensity(List<VolumeFixedUsage> volumeSet) {
    if (volumeSet == null) {
      LOG.warn("VolumeSet is null, returning 0.0 for VolumeDataDensity");
      return 0.0;
    }

    try {
      final List<VolumeFixedUsage> usableVolumes = volumeSet.stream()
          .filter(v -> v.getUsage().getCapacity() > 0)
          .collect(Collectors.toList());

      if (usableVolumes.size() <= 1) {
        return 0.0;
      }

      // Calculate ideal usage using the same immutable volume snapshot
      final double idealUsage = getIdealUsage(usableVolumes);
      double volumeDensitySum = 0.0;

      // Calculate density for each volume using the same snapshot
      for (VolumeFixedUsage volumeUsage : usableVolumes) {
        final double currentUsage = volumeUsage.getUtilization();

        // Calculate density as absolute difference from ideal usage
        double volumeDensity = Math.abs(currentUsage - idealUsage);
        volumeDensitySum += volumeDensity;
      }
      return volumeDensitySum;
    } catch (Exception e) {
      LOG.error("Error calculating VolumeDataDensity", e);
      return -1.0;
    }
  }

With this change everywhere for diskbalancer if a volume is of 0 capacity that will be excluded.

@Gargi-jais11 Thanks for the suggestion! Updated the test to compare the density with and without the zero-capacity volume, which makes the expected behavior clearer and avoids the hard-coded density value.

Copy link
Copy Markdown
Contributor

@Gargi-jais11 Gargi-jais11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, Last patch LGTM!

@Gargi-jais11 Gargi-jais11 requested a review from ChenSammi May 18, 2026 08:53
@Gargi-jais11
Copy link
Copy Markdown
Contributor

@ChenSammi Do you wish to take a look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants