Skip to content

fix: prevent dead letters from Write commands racing with TCP connection close#32905

Draft
sebastian-alfers wants to merge 1 commit intomainfrom
tcp-dead-letters
Draft

fix: prevent dead letters from Write commands racing with TCP connection close#32905
sebastian-alfers wants to merge 1 commit intomainfrom
tcp-dead-letters

Conversation

@sebastian-alfers
Copy link
Copy Markdown
Contributor

When a peer closes a TCP connection (e.g. an HAProxy SSL health check that opens and immediately closes the socket), the connection actor's close notification was sent from postStop — after the actor was fully terminated.

Any WriteCommand sent by a higher-level stream (e.g. Akka HTTP) between the peer close and the actor's death would land in dead letters with no way to intercept them.

Fix:

  • Eagerly notify interested parties (handler, close commander) inside stopWith
    before entering the unregistering state, so they receive the close event
    while the actor is still alive.
  • After NIO deregistration completes (Unregistered), transition to a new
    draining state instead of stopping immediately. The draining state answers
    any in-flight WriteCommands with CommandFailed and stops after a short
    idle timeout (100 ms).
  • Guard the postStop notification to skip actors already notified eagerly,
    while still delivering the close event to a pending write's commander if
    it differs from the connection handler.

Copy link
Copy Markdown
Contributor

@johanandren johanandren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you try it out with a reproducer and saw it silence the logs?

case Unregistered =>
// NIO cleanup done; linger briefly so any in-flight WriteCommands that race with
// the connection close (e.g. from an Akka HTTP stream) are answered with CommandFailed
// instead of silently landing in dead letters.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// instead of silently landing in dead letters.
// instead of ending up as dead letters.

The problem you are trying to solve is that it is not silent, but noisy in logs, no?

val doNothing: () => Unit = () => ()

/** How long the connection actor lingers after close to absorb in-flight writes. */
val DrainTimeout: FiniteDuration = 100.milliseconds
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not be hardcoded but come from config

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With 0 as complete opt out the lingering check, and probably the default value.

def draining: Receive = {
case write: WriteCommand =>
if (TraceLogging) log.debug("Dropping write to already-closed connection")
sender() ! CommandFailed(write)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this not just move the noisy logging to the writer?

@johanandren
Copy link
Copy Markdown
Contributor

Not convinced on the benefit vs risk/performance overhead of this, but maybe if it can be opt in through config.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants