Skip to content

bug: WSS handler socket leak #3634

@mendelskiv93

Description

@mendelskiv93

Summary:
WSS handler fails to reject malformed connections gracefully, causing TCP socket exhaustion and service degradation.

Incident Details:

  • When: 2025-10-24 02:04 UTC
  • Duration: ~5 hours (self-resolved)
  • Impact: All WSS endpoints unresponsive, TLS handshakes timing out
  • Scope: Multiple production nodes simultaneously affected

Evidence:

  • TCP socket count: 200 → 1,600+ (8x increase)
  • Log volume: 11K → 127K lines/hour (11x increase)
  • Message throughput: Unchanged (~25K/hour)
  • WSS errors: 1,733 AsyncStream Error: "Incomplete data sent or received" in one hour
  • P2P functionality: Unaffected

Root Cause:
Malformed/incomplete WSS connection attempts trigger errors in wstransport.nim (lines 294, 296):

Http Error: "Timed out parsing request"
AsyncStream Error: "Incomplete data sent or received"
Image

Connections are accepted but never properly closed, causing socket leak and Recv-Q buildup.

Expected Behavior:
WSS handler should reject invalid connections quickly and close sockets properly, preventing resource exhaustion.

Version:

  • Image: harbor.status.im/wakuorg/nwaku:deploy-status-prod (2 months old)

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

Projects

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions