Skip to content

HDDS-15290. DiskBalancer should write info file atomically.#10284

Open
slfan1989 wants to merge 5 commits into
apache:masterfrom
slfan1989:HDDS-15290
Open

HDDS-15290. DiskBalancer should write info file atomically.#10284
slfan1989 wants to merge 5 commits into
apache:masterfrom
slfan1989:HDDS-15290

Conversation

@slfan1989
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

This PR updates DiskBalancer to persist diskBalancer.info atomically.

Previously, DiskBalancer overwrote diskBalancer.info by deleting the existing file and creating a new one. If the write failed between these steps, the existing configuration could be lost or left incomplete.

This patch changes the persistence flow to:

  • create the parent directory if needed;
  • write the new DiskBalancerInfo to a temporary file in the same directory;
  • atomically move the temporary file to diskBalancer.info;
  • preserve the existing file if the temporary write fails;
  • provide clear IOException messages for directory creation, temporary file creation, and overwrite failures.

This makes DiskBalancer info persistence more robust and avoids losing the previous valid configuration on write failure.

What is the link to the Apache JIRA

JIRA: HDDS-15290. DiskBalancer should write info file atomically.

How was this patch tested?

Added unit tests.

Copy link
Copy Markdown
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @slfan1989 for the patch.

Comment on lines 356 to 373
final Path tempFile;
try {
tempFile = Files.createTempFile(parent, path.getName(), ".tmp");
} catch (IOException e) {
throw new IOException(
"Unable to create temporary DiskBalancerInfo file under: "
+ parent, e);
}
boolean moved = false;
try {
writer.write(diskBalancerInfo, tempFile.toFile());
try {
Files.move(tempFile, target, StandardCopyOption.ATOMIC_MOVE,
StandardCopyOption.REPLACE_EXISTING);
} catch (IOException e) {
throw new IOException(
"Unable to overwrite the DiskBalancerInfo file: " + target, e);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Temp file and atomic move to final location is not needed, because YamlUtils uses AtomicFileOutputStream internally, which performs the same.

final DiskBalancerInfoYaml data = getDiskBalancerInfoYaml(diskBalancerInfo);
YamlUtils.dump(yaml, data, path, LOG);

public static void dump(Yaml yaml, Object data, File file, Logger log) throws IOException {
try (OutputStream out = new AtomicFileOutputStream(file);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Removed the redundant temp-file and atomic move wrapper. DiskBalancerService now only creates the parent directory and relies on DiskBalancerYaml/YamlUtils, which already uses AtomicFileOutputStream for atomic writes.

Comment on lines +432 to +433
assertTrue(exception.getMessage()
.startsWith("Unable to overwrite the DiskBalancerInfo file: "));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please replace with assertThat(exception).hasMessageStartingWith(...) (see HDDS-9951)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Replaced the message prefix assertions with assertThat(exception).hasMessageStartingWith(...).

}

@VisibleForTesting
static void writeDiskBalancerInfoFile(DiskBalancerInfo diskBalancerInfo,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • write the new DiskBalancerInfo to a temporary file in the same directory;
  • atomically move the temporary file to diskBalancer.info;

Curious where the two atomic write steps are implemented in this change?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two steps are implemented by AtomicFileOutputStream used in YamlUtils.dump():

DiskBalancerService.writeDiskBalancerInfoFile()
-> DiskBalancerYaml.createDiskBalancerInfoFile()
-> YamlUtils.dump()
-> new AtomicFileOutputStream(file)

So DiskBalancerService no longer creates the temporary file or moves it directly. It relies on the existing YamlUtils atomic write path and avoids the previous non-atomic delete/createNewFile before writing.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this pr is only for improving the exception handling for disk balancer info file atomic write?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not only exception handling. The main purpose is to remove the previous non-atomic delete/createNewFile step before writing diskBalancer.info.

The actual atomic write is already handled by DiskBalancerYaml -> YamlUtils.dump() -> AtomicFileOutputStream. This PR keeps that path, creates the parent directory when needed, and wraps directory/write failures with clearer error messages.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the important part is removal of explicit delete and createNewFile from here:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I got it, then how about just remove that branch and add some exception handling around those called functions that might throw exception? looks like we dont need to add an additional static func for this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense.

Since YamlUtils already provides the atomic write behavior, a separate static helper is not strictly needed.

I will simplify the change by keeping the logic inside writeDiskBalancerInfoTo(), remove the extra static method, and only add parent directory creation plus clearer exception wrapping around the existing DiskBalancerYaml.createDiskBalancerInfoFile() call.

slfan1989 and others added 2 commits May 16, 2026 17:38
…ozone/container/diskbalancer/DiskBalancerService.java

Co-authored-by: Doroszlai, Attila <6454655+adoroszlai@users.noreply.github.com>
@adoroszlai adoroszlai requested a review from peterxcli May 17, 2026 05:36
Copy link
Copy Markdown
Member

@peterxcli peterxcli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1

@adoroszlai
Copy link
Copy Markdown
Contributor

@Gargi-jais11 would you like to take a look?

@Gargi-jais11
Copy link
Copy Markdown
Contributor

@Gargi-jais11 would you like to take a look?

@adoroszlai Yes will take a look. Give me sometime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants