Skip to content

Don't leak unconsumed deferreds in error state in HTTP client#767

Merged
DerGuteMoritz merged 13 commits intomasterfrom
fix-766
Jan 20, 2026
Merged

Don't leak unconsumed deferreds in error state in HTTP client#767
DerGuteMoritz merged 13 commits intomasterfrom
fix-766

Conversation

@DerGuteMoritz
Copy link
Copy Markdown
Collaborator

@DerGuteMoritz DerGuteMoritz commented Sep 16, 2025

Fixes #766

See individual commits for details.

@DerGuteMoritz DerGuteMoritz marked this pull request as draft September 16, 2025 15:39
@DerGuteMoritz DerGuteMoritz changed the title Draft: Make wrap-exceptions middleware robust against nil response body Don't leak unconsumed deferreds in error state in HTTP client Sep 18, 2025
@DerGuteMoritz DerGuteMoritz force-pushed the fix-766 branch 7 times, most recently from c87dd2e to 6d642dc Compare January 5, 2026 19:05
@DerGuteMoritz DerGuteMoritz marked this pull request as ready for review January 5, 2026 19:06
@DerGuteMoritz DerGuteMoritz requested a review from Copilot January 11, 2026 15:54
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes issue #766 by preventing leaked unconsumed deferreds in error state in the HTTP client. The fix refactors exception handling to move error status processing from middleware wrapping to explicit handling in the response chain, and distinguishes between error-status exceptions (HTTP 4xx/5xx with :throw-exceptions? true) and actual request failures to prevent disposing connections for expected error responses.

Changes:

  • Refactored exception handling by moving wrap-exceptions middleware logic into handle-error-status function that's called in the response chain
  • Updated error handling in HTTP client to avoid disposing connections for HTTP error status responses (when :throw-exceptions? is true)
  • Added comprehensive test infrastructure for detecting dropped error deferreds via manifold 0.5.0
  • Restructured CI configuration to run separate test jobs for leak detection and dropped error deferred detection

Reviewed changes

Copilot reviewed 18 out of 19 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/aleph/http/client_middleware.clj Refactored wrap-exceptions into handle-error-status function with explicit error deferred creation and added nil body handling
src/aleph/http.clj Moved error status handling to response chain and updated error handler to skip connection disposal for error-status exceptions
src/aleph/http/client.clj Changed attach-on-close-handler from d/chain' to d/on-realized with explicit error callback to avoid leaking error deferreds
test/aleph/testutils.clj Added conditional instrumentation function for dropped error deferred detection using manifold.test
test/aleph/http_test.clj Added test for error status exception handling and fixed bulk request test syntax
test/aleph/http/client_middleware_test.clj Added comprehensive tests for handle-error-status with different body types
project.clj Updated manifold dependency to 0.5.0 and added profile for dropped error deferred detection
deps.edn Updated manifold dependency to 0.5.0
.circleci/config.yml Restructured CI to use workflows with separate jobs for different test modes and parallel execution
test/*.clj (multiple files) Added instrumentation calls for dropped error deferred detection across all test namespaces

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/aleph/http/client.clj Outdated
Comment thread src/aleph/http.clj Outdated
Comment thread test/aleph/http_test.clj Outdated
@DerGuteMoritz
Copy link
Copy Markdown
Collaborator Author

FTR: The checks are currently failing quite consistently with #772 while they don't on master. This indicates that something about the change here causes it. Investigation is underway.

To that end, turn `wrap-exceptions` from a wrapping middleware into an "after handler" (like
`handle-redirects`) called `handle-error-status`.

This also happens to improve a suboptimal behavior: When a request with `:throw-exceptions? true`
was made and an error response was received which the middleware would turn into an exception, the
underlying connection would be closed regardless of whether the response indicated keep-alive or not.
@DerGuteMoritz
Copy link
Copy Markdown
Collaborator Author

OK I have since convinced myself that the SETTINGS+ACK frame error is due to a race condition in Netty itself (see netty/netty#16142). After some experimentation, I found that running the tests with dropped error deferred detection has a higher chance of provoking the SETTINGS frame error. Furthermore, when running the original test job (test_with_leak_detection) in parallel, it too has a higher chance of provoking the error (I suspect CircleCI colocates the jobs on the same runner, resulting in some noisy neighbor interactions). Thus, I now opted to turn the dropped error deferred detection test job (test_with_dropped_error_deferred_detection) into a manually triggered one which reduces the chance of running into this particular flake to the original level. The idea then is to manually trigger this job on PRs which touch the code in ways that could result in new dropped error deferreds. In addition, I added a weekly scheduled run of test_with_dropped_error_deferred_detection to account for cases where we forget to manually trigger it. I think in this form, it's good to go now.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/aleph/http.clj Outdated
Comment thread src/aleph/http.clj
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread test/aleph/http_test.clj
Comment thread test/aleph/http_test.clj
Comment thread src/aleph/http.clj
Comment thread src/aleph/http/client_middleware.clj
Comment thread .circleci/config.yml
Comment thread .circleci/config.yml
@DerGuteMoritz DerGuteMoritz force-pushed the fix-766 branch 5 times, most recently from dc9269f to 0981e0c Compare January 19, 2026 20:15
Turns out the custom pool wasn't actually used!
We breached the default of 10 minutes a few times now
Enabling both Netty's resource leak detection and Manifold's dropped error deferred detection at the
same time makes the test suite so slow that we breach CircleCI's timeout of 60 minutes.
Turns out CircleCI's 60 minutes timeout applies to the overall job, not just per step. Thus, split
the test runs up into separate jobs so that they both get 60 minutes. This has the additional
advantage that they will run in parallel.
Since that job is currently very prone to flakes (due to #772) and it's also quite slow, manually
triggering it on demand is preferable.
As a last resort in case we merge something which introduces a new dropped error deferred case and
we forgot to run the test manually.
Turns out that `handle-redirects` returns a deferred when it follows a redirect which then would
lead to failure in `handle-error-status` behind it.
…hind `test_with_leak_detection`

This way, when `test_with_leak_detection` flakes, we can use "rerun from failed" in CircleCI without
having to first cancel the approval job.
@DerGuteMoritz DerGuteMoritz merged commit 0468eba into master Jan 20, 2026
4 checks passed
@DerGuteMoritz DerGuteMoritz deleted the fix-766 branch January 20, 2026 09:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

NPE in wrap-exceptions HTTP client middleware

2 participants