Include failure context in splice events#4514
Include failure context in splice events#4514jkczyz wants to merge 19 commits intolightningdevkit:mainfrom
Conversation
|
🎉 This PR is now ready for review! |
333980d to
c2cad50
Compare
| /// sent an invalid message). Retry by calling [`ChannelManager::splice_channel`]. | ||
| /// | ||
| /// [`ChannelManager::splice_channel`]: crate::ln::channelmanager::ChannelManager::splice_channel | ||
| NegotiationError, |
There was a problem hiding this comment.
Note we could have this wrap AbortReason -- we had an internal NegotiationError struct that already wrapped this along with contributions -- but it would require making AbortReason public. Not sure if we want to expose all its variants.
| fn with_negotiation_failure_reason(mut self, reason: NegotiationFailureReason) -> Self { | ||
| match self { | ||
| QuiescentError::FailSplice(_, ref mut r) => *r = reason, | ||
| _ => debug_assert!(false, "Expected FailSplice variant"), |
There was a problem hiding this comment.
Little unfortunate we need to do this, but I couldn't find a better way other than passing NegotiationFailureReason to abandon_quiescent_action and quiescent_action_into_error, which doesn't seem right since they could be for future QuiescentAction variants.
| /// The outpoint of the channel's splice funding transaction, if one was created. | ||
| abandoned_funding_txo: Option<OutPoint>, | ||
| /// The features that this channel will operate with, if available. | ||
| channel_type: Option<ChannelTypeFeatures>, |
There was a problem hiding this comment.
Didn't feel like there was much value to these.
abandoned_funding_txowould be set only if there's a failure after the transaction was negotiated but before signed.- In the future, when
channel_typecan change, maybe it is just part ofFundingContribution? Could also keep it if you prefer.
|
🔔 1st Reminder Hey @TheBlueMatt @wpaulino! This PR has been waiting for your review. |
1 similar comment
|
🔔 1st Reminder Hey @TheBlueMatt @wpaulino! This PR has been waiting for your review. |
|
🔔 2nd Reminder Hey @TheBlueMatt @wpaulino! This PR has been waiting for your review. |
1 similar comment
|
🔔 2nd Reminder Hey @TheBlueMatt @wpaulino! This PR has been waiting for your review. |
| impl QuiescentError { | ||
| fn with_negotiation_failure_reason(mut self, reason: NegotiationFailureReason) -> Self { | ||
| match self { | ||
| QuiescentError::FailSplice(_, ref mut r) => *r = reason, | ||
| _ => debug_assert!(false, "Expected FailSplice variant"), | ||
| } | ||
| self | ||
| } | ||
| } |
There was a problem hiding this comment.
Minor: The debug_assert!(false) path is only reachable in non-production builds if a caller passes a non-FailSplice variant. Currently the only caller passes through quiescent_action_into_error which always produces FailSplice for production code (only DoNothing in test builds). But debug_assert!(false) will silently return self unchanged in release builds — consider using debug_assert! with a more specific message about which variant was unexpected, or restructuring so the method only accepts FailSplice to make this impossible.
| let is_initiator = pending_splice | ||
| .funding_negotiation | ||
| .take() | ||
| .map(|negotiation| negotiation.is_initiator()) | ||
| .unwrap_or(false); |
There was a problem hiding this comment.
The is_initiator fallback when funding_negotiation is None defaults to false, which means splice_funding_failed_for! will return None for non-initiators (since the macro has None if !$is_initiator => None). This means if the funding negotiation was already taken before reset_pending_splice_state is called and the node was actually the initiator, the splice failure event would be silently dropped.
Is funding_negotiation ever None when reset_pending_splice_state is called? If so, defaulting to false may suppress failure events for the initiator. If not, this should be a debug_assert or .expect() to catch violations of that invariant.
There was a problem hiding this comment.
Added debug assertions.
| let is_initiator = pending_splice | ||
| .funding_negotiation | ||
| .as_ref() | ||
| .map(|negotiation| negotiation.is_initiator()) | ||
| .unwrap_or(false); |
There was a problem hiding this comment.
Same concern here: funding_negotiation.as_ref() being None causes is_initiator to default to false, which will cause the splice_funding_failed_for! macro to return None for non-initiators. Since maybe_splice_funding_failed is the read-only version used during serialization, silently returning None means the splice failure events won't be persisted and the user won't learn about the failure after restart.
Consider whether funding_negotiation can actually be None in this path, and if not, using expect to enforce the invariant.
There was a problem hiding this comment.
Added debug assertions.
|
I've completed a thorough re-review of the entire PR diff. All previously flagged issues have been verified and no significant new bugs were found. Review SummaryNo new bugs or security issues found beyond what was already flagged in the prior review pass. Previously flagged issues (still applicable):
Previously flagged issues (now resolved):
Cross-cutting concerns:
|
Each splice negotiation round can fail for different reasons, but Event::SpliceFailed previously gave no indication of what went wrong. Add a NegotiationFailureReason enum so users can distinguish failures and take appropriate action (e.g., retry with a higher feerate vs. wait for the channel to become usable). The reason is determined at each channelmanager emission site based on context rather than threaded through channel.rs internals, since the channelmanager knows the triggering context (disconnect, tx_abort, shutdown, etc.) while channel.rs functions like abandon_quiescent_action handle both splice and non-splice quiescent actions. The one exception is QuiescentError::FailSplice, which carries a reason alongside the SpliceFundingFailed. This is appropriate because FailSplice is already splice-specific, and the channel.rs code that constructs it (e.g., contribution validation, feerate checks) knows the specific failure cause. A with_negotiation_failure_reason method on QuiescentError allows callers to override the default when needed. Older serializations that lack the reason field default to Unknown via default_value in deserialization. The persistence reload path uses PeerDisconnected since a reload implies the peer connection was lost. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
c2cad50 to
c469423
Compare
|
Rebased |
lightning/src/ln/channelmanager.rs
Outdated
| @@ -4128,6 +4118,16 @@ impl< | |||
| }, | |||
There was a problem hiding this comment.
nit: Unlike handle_quiescent_error (line ~6619), this site unconditionally emits DiscardFunding even when contributed_inputs and contributed_outputs are both empty (which happens for the initiator when all inputs are committed to prior rounds). The same inconsistency exists at lines ~4417, ~4920, ~11853, ~12012, ~12180, ~12329, ~15448, and ~18122.
Consider adding the same guard here:
if !splice_funding_failed.contributed_inputs.is_empty() || !splice_funding_failed.contributed_outputs.is_empty() {
// emit DiscardFunding
}Or, equivalently, have the macro return None for the initiator when there are no unique contributions (matching the non-initiator behavior) and emit SpliceNegotiationFailed without DiscardFunding in those cases.
There was a problem hiding this comment.
Fixed by making SpliceFundingFailed consumable into Option-al parts rather than exposing fields. The into_parts method now checks for empty contributions.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #4514 +/- ##
=======================================
Coverage 87.09% 87.10%
=======================================
Files 163 163
Lines 108856 108776 -80
Branches 108856 108776 -80
=======================================
- Hits 94808 94744 -64
+ Misses 11563 11545 -18
- Partials 2485 2487 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…cence Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
c469423 to
4f6b1a6
Compare
Unknown future variants (using odd IDs) will deserialize as Unknown instead of blocking ChannelManager loading. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the abandoned_funding_txo and channel_type fields on Event::SpliceFailed with an Option<FundingContribution> from the failed round. Users can feed this back to funding_contributed to retry or use it to inform a fresh attempt via splice_channel. Also makes FundingContribution::feerate() public so users can inspect the feerate when deciding whether to retry or bump. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4f6b1a6 to
0609565
Compare
Changed to use |
… events Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reverse the event ordering at all emission sites so that Event::DiscardFunding is emitted before Event::SpliceFailed. If the user retries the splice when handling SpliceFailed, the contributed inputs would still be locked. A subsequent DiscardFunding would then incorrectly unlock inputs that are now committed to the new attempt. Emitting DiscardFunding first avoids this by ensuring inputs are unlocked before any retry occurs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rename Event::SplicePending to Event::SpliceNegotiated and Event::SpliceFailed to Event::SpliceNegotiationFailed. These names better reflect the per-round semantics: each negotiation attempt resolves to one of these two outcomes, independent of the overall splice lifecycle. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The was_negotiated check is unnecessary because reset_pending_splice_state only runs when funding_negotiation is present, meaning on_tx_signatures_exchange hasn't been called yet. Since the feerate is only recorded in last_funding_feerate_sat_per_1000_weight during on_tx_signatures_exchange, the current round's feerate can never match it. So the contribution can always be unconditionally popped. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Filter outputs by script_pubkey rather than full TxOut equality. Outputs reusing the same address as a prior round are still considered committed even if the value differs (e.g., different change amounts across RBF rounds with different feerates). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the maybe_create_splice_funding_failed! macro and splice_funding_failed_for method with a unified splice_funding_failed_for! macro that derives contributed inputs and outputs from the FundingContribution rather than extracting them from the negotiation state. Callers pass ident parameters for which PendingSplice filtering methods to use: contributed_inputs/contributed_outputs when the current round's contribution has been popped or was never pushed, and prior_contributed_inputs/prior_contributed_outputs for the read-only persistence path where the contribution is cloned instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…gotiationContext Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…hods Now that splice_funding_failed_for! derives inputs and outputs from FundingContribution directly, remove the unused NegotiationError struct and into_negotiation_error methods from the interactive tx types, along with the into/to_contributed_inputs_and_outputs methods on ConstructedTransaction. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
0609565 to
44ecc06
Compare
|
🔔 3rd Reminder Hey @TheBlueMatt @wpaulino! This PR has been waiting for your review. |
1 similar comment
|
🔔 3rd Reminder Hey @TheBlueMatt @wpaulino! This PR has been waiting for your review. |
|
🔔 4th Reminder Hey @TheBlueMatt @wpaulino! This PR has been waiting for your review. |
1 similar comment
|
🔔 4th Reminder Hey @TheBlueMatt @wpaulino! This PR has been waiting for your review. |
| /// [`ChannelManager::splice_channel`], or wait for the counterparty to initiate. | ||
| /// | ||
| /// [`ChannelManager::splice_channel`]: crate::ln::channelmanager::ChannelManager::splice_channel | ||
| CounterpartyAborted, |
There was a problem hiding this comment.
Is the "Message data" in tx_abort not a string that we can include here?
| /// | ||
| /// [`ChannelManager::splice_channel`]: crate::ln::channelmanager::ChannelManager::splice_channel | ||
| PeerDisconnected, | ||
| /// The counterparty aborted the negotiation by sending `tx_abort`. Retry by calling |
There was a problem hiding this comment.
Almost each docstring says "Retry by calling splice_channel". That doesn't seem like particularly helpful advice especially in cases where the counterparty rejected implying just trying again is unlikely to fix it?
| /// Adjust the contribution and retry via [`ChannelManager::splice_channel`]. | ||
| /// | ||
| /// [`ChannelManager::splice_channel`]: crate::ln::channelmanager::ChannelManager::splice_channel | ||
| ContributionInvalid, |
There was a problem hiding this comment.
It does seem like some of these could use more details (error message string?). As-is given the recommendation for ~all enum variants is "retry by calling splice_channel again" its not really clear to me why we should bother with an enum - an enum is useful if the downstream code has different handling for different cases, but here that doesn't really seem likely/possible. If we can't expose more details that are useful for automated action, we could probably do with an enum with three variants and a developer-readable string.
| if let Some(splice_funding_failed) = splice_funding_failed { | ||
| let (funding_info, contribution) = splice_funding_failed.into_parts(); | ||
| let mut pending_events = self.pending_events.lock().unwrap(); | ||
| if let Some(funding_info) = funding_info { |
There was a problem hiding this comment.
We should document which comes first and maybe test it more exhaustively - eg in get_and_clear_pending_events assert the order if both events appear for the same channel?
When a splice negotiation round fails, the user needs to know why it failed and what they can do about it. Previously,
SpliceFailedgave no indication of the failure cause and didn't return the contribution, making it difficult to retry.This PR adds failure context to the splice events so users can make informed retry decisions:
NegotiationFailureReasonindicates what went wrong — peer disconnect, counterparty abort, invalid contribution, insufficient feerate, channel closing, etc. Each variant documents how to resolve it.FundingContributionfrom the failed round is returned in the event. Users can feed it back tofunding_contributedto retry as-is, or inspect it to understand which inputs, outputs, and feerate were used when constructing a new contribution.SplicePendingandSpliceFailedare renamed toSpliceNegotiatedandSpliceNegotiationFailedto reflect that each negotiation round (initial or RBF) independently resolves to one of these two outcomes.DiscardFundingis now emitted beforeSpliceNegotiationFailedso wallet inputs are unlocked before the failure handler runs. Otherwise, a retry duringSpliceNegotiationFailedhandling would leave inputs locked, and the subsequentDiscardFundingwould incorrectly unlock inputs committed to the new attempt.Additionally,
SpliceFundingFailedinternals are simplified:funding_txoandchannel_typefields are removed.was_negotiatedcheck is removed from the contribution pop.FundingContributionvia a unifiedsplice_funding_failed_for!macro, replacing the old approach of extracting them fromFundingNegotiationvariants. Unused conversion methods are removed.Also fixes a bug in
into_unique_contributionswhere outputs were compared by fullTxOutequality instead ofscript_pubkey. This could fail to filter change outputs that reused the same address but had different values across RBF rounds.