Skip to content

Fixed TTL problems#17735

Open
Caideyipi wants to merge 3 commits into
masterfrom
ttl-fix
Open

Fixed TTL problems#17735
Caideyipi wants to merge 3 commits into
masterfrom
ttl-fix

Conversation

@Caideyipi
Copy link
Copy Markdown
Collaborator

Description

This PR fixes two correctness issues in ConfigNode TTL handling.

  1. TTL rule capacity checking incorrectly treated updates to existing TTL rules as new rules.
  2. SetTTLProcedure did not rollback TTL state when ConfigNode metadata had been updated but DataNode TTL cache
    update failed.

Details

1. Fix TTL rule capacity accounting

TTLInfo now checks the number of newly introduced TTL rules instead of simply checking the current total rule count.

This also fixes the database TTL case: setting TTL on a database effectively writes both the database path and its
** wildcard path, so capacity validation now accounts for both entries.

As a result:

  • updating an existing TTL rule no longer fails when the capacity limit is already reached
  • database-level TTL no longer bypasses the real capacity requirement

2. Add rollback for SetTTLProcedure

SetTTLProcedure now captures the previous TTL state before writing the new value to ConfigNode.

If the procedure fails while updating DataNode TTL cache, it will rollback:

  • the TTL metadata on ConfigNode
  • the TTL cache on DataNodes

For database TTL, the rollback also restores the wildcard TTL entry (db.**) separately.

The rollback state is serialized with the procedure so that recovery/replay can still restore the previous TTL state
correctly.

Tests

Added/updated tests for:

  • updating existing TTL when capacity is reached
  • updating existing TTL when current state is already oversize
  • database TTL capacity accounting
  • rollback when previous TTL does not exist
  • rollback of database wildcard TTL
  • procedure serialization with captured rollback state

This PR has:

  • been self-reviewed.
    • concurrent read
    • concurrent write
    • concurrent read and write
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods.
  • added or updated version, license, or notice information
  • added comments explaining the "why" and the intent of the code wherever would not be obvious
    for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold
    for code coverage.
  • added integration tests.
  • been tested in a test IoTDB cluster.

Key changed/added classes (or packages if there are too many classes) in this PR

@sonarqubecloud
Copy link
Copy Markdown

@codecov
Copy link
Copy Markdown

codecov Bot commented May 21, 2026

Codecov Report

❌ Patch coverage is 71.79487% with 33 lines in your changes missing coverage. Please review.
✅ Project coverage is 40.58%. Comparing base (7563ac8) to head (629f319).

Files with missing lines Patch % Lines
...fignode/procedure/impl/schema/SetTTLProcedure.java 69.44% 33 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #17735      +/-   ##
============================================
+ Coverage     40.55%   40.58%   +0.03%     
  Complexity     2574     2574              
============================================
  Files          5179     5179              
  Lines        349896   349989      +93     
  Branches      44727    44741      +14     
============================================
+ Hits         141890   142041     +151     
+ Misses       208006   207948      -58     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@CRZbulabula CRZbulabula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Review superseded by the English version below.)

Copy link
Copy Markdown
Contributor

@CRZbulabula CRZbulabula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both the capacity-accounting and the rollback-on-DataNode-failure fixes look correct. Comments below are organized by code location and focus on the timing of state capture in SetTTLProcedure, the cost of the consensus read used to capture it, and a handful of readability/test improvements.

final int tTlRuleCapacity = CommonDescriptor.getInstance().getConfig().getTTlRuleCapacity();
if (getTTLCount() >= tTlRuleCapacity) {
final int newTTLRuleCount = calculateNewTTLRuleCount(plan);
if (newTTLRuleCount > 0 && ttlCache.getTtlCount() + newTTLRuleCount > tTlRuleCapacity) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capacity check looks correct. Short-circuiting on newTTLRuleCount > 0 is what lets testUpdateExistingTTLWhenCurrentStateIsAlreadyOversize pass — an update of an existing rule no longer trips the limit even when the current count is already over capacity.

One diagnostic suggestion: when the limit is hit, include both tTlRuleCapacity and ttlCache.getTtlCount() + newTTLRuleCount in the error message. It makes it much easier to tell at a glance whether the cluster genuinely filled the table, or whether the capacity was tightened by a config change.

}

private int getNewTTLRuleCount(String[] pathNodes) {
return ttlCache.getLastNodeTTL(pathNodes) == TTLCache.NULL_TTL ? 1 : 0;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getNewTTLRuleCount is slightly mis-named — it returns a 0/1 indicator, not a count. Consider renaming to isNewTTLRule (return boolean) and accumulating boolean ? 1 : 0 in calculateNewTTLRuleCount. Reads more naturally at the call site.

private static final long TTL_NOT_EXIST = Long.MIN_VALUE;

private SetTTLPlan plan;
private long previousTTL = TTL_NOT_EXIST;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth a one-line comment explaining why this sentinel is Long.MIN_VALUE rather than TTLCache.NULL_TTL (-1). They have different meanings on the rollback path: TTL_NOT_EXIST means "no TTL was set before this procedure", while NULL_TTL is the explicit "unset" marker that ConfigPlanExecutor interprets to route to unsetTTL. Conflating the two would corrupt rollback behavior.

}
protected void setConfigNodeTTL(final ConfigNodeProcedureEnv env) {
capturePreviousTTLState(env);
final TSStatus res = writeConfigNodePlan(env, plan);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(medium) previousTTL can be overwritten with the new value after a crash.

capturePreviousTTLState and writeConfigNodePlan both run inside one executeFromState(SET_CONFIGNODE_TTL) invocation, but the procedure framework only persists previousTTLStateCaptured / previousTTL on state transitions. If the ConfigNode crashes after the consensus write succeeds but before setNextState(UPDATE_DATANODE_CACHE) is durable, recovery replays SET_CONFIGNODE_TTL: previousTTLStateCaptured is still false, so we re-run capture — and getAllTTL() now returns the new value, which gets recorded as the "previous" TTL. If the DataNode update then fails, the rollback restores the new value over itself, losing the original.

Suggestion: split capture into its own pre-state (e.g. CAPTURE_PREVIOUS_TTLSET_CONFIGNODE_TTLUPDATE_DATANODE_CACHE) so the captured snapshot is persisted with procedure state before the consensus write.

res = new TSStatus(TSStatusCode.EXECUTE_STATEMENT_ERROR.getStatusCode());
res.setMessage(e.getMessage());
}
protected void setConfigNodeTTL(final ConfigNodeProcedureEnv env) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These methods (setConfigNodeTTL, updateDataNodeTTL, writeConfigNodePlan, sendTTLRequest) are made protected only so that TestingSetTTLProcedure in the test file can override them — no production subclass needs them. Consider making them package-private and putting the test class in the same package; that keeps the public/protected surface narrower.

rollbackFailure = e;
}
try {
rollbackDataNodeTTL(env);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two suggestions on the rollback error-aggregation strategy:

  1. Only the first exception survives; the second is logged but lost. Use rollbackFailure.addSuppressed(e) so a postmortem can still see both.
  2. Add a method-level comment clarifying the policy: "best-effort — if one side fails, the other is still attempted; the earliest exception is what propagates." Otherwise readers might expect fail-fast semantics.

final int length = ReadWriteIOUtils.readInt(byteBuffer);
final int position = byteBuffer.position();
this.plan = (SetTTLPlan) ConfigPhysicalPlan.Factory.create(byteBuffer);
byteBuffer.position(position + length);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This byteBuffer.position(position + length) is necessary (not redundant) — worth a comment so a future reader doesn't "clean it up".

Reason: serializeToByteBuffer() returns ByteBuffer.wrap(publicBAOS.getBuf(), 0, size()). ReadWriteIOUtils.write(ByteBuffer, OutputStream) then writes byteBuffer.capacity() (the full backing array length, which can be greater than the actual data size) followed by the whole array, padding included. Without forcing the position back to position + length, the new boolean + 2*long trailing fields would be read from those padding zero bytes.

final int position = byteBuffer.position();
this.plan = (SetTTLPlan) ConfigPhysicalPlan.Factory.create(byteBuffer);
byteBuffer.position(position + length);
if (byteBuffer.remaining() >= 17) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

17 is a magic number (1 byte boolean + 2 * 8 byte long). Extract a constant:

private static final int ROLLBACK_STATE_BYTES = Byte.BYTES + Long.BYTES * 2;
...
if (byteBuffer.remaining() >= ROLLBACK_STATE_BYTES) { ... }

Makes it survive any future addition of rollback fields without forgetting to update the guard.

return this.isGeneratedByPipe == that.isGeneratedByPipe
&& this.previousTTL == that.previousTTL
&& this.previousDatabaseWildcardTTL == that.previousDatabaseWildcardTTL
&& this.previousTTLStateCaptured == that.previousTTLStateCaptured
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Including previousTTL / previousDatabaseWildcardTTL / previousTTLStateCaptured in equals/hashCode is consistent with the serialization change, but please confirm no caller dedups procedures by equals on the wait queue. If something does, two procedures with the same plan but different captured state would now be considered distinct, which could change idempotency behavior. If there's no such caller, ignore this comment.

(SetTTLProcedure) ProcedureFactory.getInstance().create(buffer);
assertSerializedProcedure(
deserializedProcedure, "root.db", 2000L, true, true, 500L, 600L, false);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a backward-compatibility deserialization test: construct an "old format" byte stream by hand — [procedureType:short][super state...][length:int][plan bytes] with no trailing 17 bytes — feed it through deserialize, and assert previousTTLStateCaptured == false, previousTTL == Long.MIN_VALUE, previousDatabaseWildcardTTL == Long.MIN_VALUE. The current round-trip test only exercises new-write/new-read; without an old-format fixture, a rolling upgrade where an in-flight pre-PR procedure is persisted and then resumed on a post-PR ConfigNode is not covered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants