Skip to content

HDDS-15171. Add available space check on follower during bootstrap.#10185

Draft
sadanand48 wants to merge 5 commits into
apache:masterfrom
sadanand48:HDDS-15171
Draft

HDDS-15171. Add available space check on follower during bootstrap.#10185
sadanand48 wants to merge 5 commits into
apache:masterfrom
sadanand48:HDDS-15171

Conversation

@sadanand48
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Currently if follower doesn't have enough space to accomodate the tarball from the leader, it fails the attempt however leader will keep trying to install snapshot,
This PR is to add a space check before starting the transfer. The space check is against a predefined config whose default size is set to 5GB.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-15171

How was this patch tested?

unit tests

@sadanand48 sadanand48 changed the title HDDS-15171. Add available space check on follower during bootstrap. HDDS-15171. Add available space check on follower during bootstrap. May 5, 2026
@sadanand48 sadanand48 changed the title HDDS-15171. Add available space check on follower during bootstrap. HDDS-15171. Add available space check on follower during bootstrap. May 5, 2026
@jojochuang jojochuang added the snapshot https://issues.apache.org/jira/browse/HDDS-6517 label May 7, 2026
@smengcl
Copy link
Copy Markdown
Contributor

smengcl commented May 7, 2026

Thanks @sadanand48 . CI is failing in checkstyle:

Comment on lines +2383 to +2384
<name>ozone.om.bootstrap.min.space</name>
<value>5GB</value>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good first step, but the best approach is to get an estimate on how much this space it would actually need to download and unpack, because it could be well exceeding 5GB? CMIIW

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Maybe reuse the estimates from org.apache.hadoop.ozone.om.snapshot.logEstimatedTarballSize in the preemptive space check before the transfer.

for (Throwable t = ioe; t != null; t = t.getCause()) {
if (t instanceof FileSystemException) {
FileSystemException fse = (FileSystemException) t;
String reason = fse.getReason();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can do the error message string matching once

Suggested change
String reason = fse.getReason();
String reason = (t instanceof FileSystemException fse && fse.getReason() != null)
? fse.getReason() : t.getMessage();

@@ -4101,6 +4052,13 @@ public synchronized TermIndex installSnapshotFromLeader(String leaderId) throws
omDBCheckpoint = omRatisSnapshotProvider.
downloadDBSnapshotFromLeader(leaderId);
} catch (IOException ex) {
if (OmRatisSnapshotProvider.isDiskFullOrQuotaIOException(ex)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't an error already logged in downloadDBSnapshotFromLeader? Why do we need to log it again here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

snapshot https://issues.apache.org/jira/browse/HDDS-6517

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants