Don't leak unconsumed deferreds in error state in HTTP client#767
Don't leak unconsumed deferreds in error state in HTTP client#767DerGuteMoritz merged 13 commits intomasterfrom
Conversation
69bc640 to
b4d0991
Compare
b4d0991 to
2affe03
Compare
c87dd2e to
6d642dc
Compare
There was a problem hiding this comment.
Pull request overview
This PR fixes issue #766 by preventing leaked unconsumed deferreds in error state in the HTTP client. The fix refactors exception handling to move error status processing from middleware wrapping to explicit handling in the response chain, and distinguishes between error-status exceptions (HTTP 4xx/5xx with :throw-exceptions? true) and actual request failures to prevent disposing connections for expected error responses.
Changes:
- Refactored exception handling by moving
wrap-exceptionsmiddleware logic intohandle-error-statusfunction that's called in the response chain - Updated error handling in HTTP client to avoid disposing connections for HTTP error status responses (when
:throw-exceptions?is true) - Added comprehensive test infrastructure for detecting dropped error deferreds via manifold 0.5.0
- Restructured CI configuration to run separate test jobs for leak detection and dropped error deferred detection
Reviewed changes
Copilot reviewed 18 out of 19 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/aleph/http/client_middleware.clj | Refactored wrap-exceptions into handle-error-status function with explicit error deferred creation and added nil body handling |
| src/aleph/http.clj | Moved error status handling to response chain and updated error handler to skip connection disposal for error-status exceptions |
| src/aleph/http/client.clj | Changed attach-on-close-handler from d/chain' to d/on-realized with explicit error callback to avoid leaking error deferreds |
| test/aleph/testutils.clj | Added conditional instrumentation function for dropped error deferred detection using manifold.test |
| test/aleph/http_test.clj | Added test for error status exception handling and fixed bulk request test syntax |
| test/aleph/http/client_middleware_test.clj | Added comprehensive tests for handle-error-status with different body types |
| project.clj | Updated manifold dependency to 0.5.0 and added profile for dropped error deferred detection |
| deps.edn | Updated manifold dependency to 0.5.0 |
| .circleci/config.yml | Restructured CI to use workflows with separate jobs for different test modes and parallel execution |
| test/*.clj (multiple files) | Added instrumentation calls for dropped error deferred detection across all test namespaces |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
6d642dc to
051b781
Compare
|
FTR: The checks are currently failing quite consistently with #772 while they don't on |
To that end, turn `wrap-exceptions` from a wrapping middleware into an "after handler" (like `handle-redirects`) called `handle-error-status`. This also happens to improve a suboptimal behavior: When a request with `:throw-exceptions? true` was made and an error response was received which the middleware would turn into an exception, the underlying connection would be closed regardless of whether the response indicated keep-alive or not.
051b781 to
33d1e28
Compare
0967512 to
56d13a4
Compare
|
OK I have since convinced myself that the SETTINGS+ACK frame error is due to a race condition in Netty itself (see netty/netty#16142). After some experimentation, I found that running the tests with dropped error deferred detection has a higher chance of provoking the SETTINGS frame error. Furthermore, when running the original test job ( |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 18 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
dc9269f to
0981e0c
Compare
Turns out the custom pool wasn't actually used!
We breached the default of 10 minutes a few times now
Enabling both Netty's resource leak detection and Manifold's dropped error deferred detection at the same time makes the test suite so slow that we breach CircleCI's timeout of 60 minutes.
Turns out CircleCI's 60 minutes timeout applies to the overall job, not just per step. Thus, split the test runs up into separate jobs so that they both get 60 minutes. This has the additional advantage that they will run in parallel.
Since that job is currently very prone to flakes (due to #772) and it's also quite slow, manually triggering it on demand is preferable.
As a last resort in case we merge something which introduces a new dropped error deferred case and we forgot to run the test manually.
Turns out that `handle-redirects` returns a deferred when it follows a redirect which then would lead to failure in `handle-error-status` behind it.
…hind `test_with_leak_detection` This way, when `test_with_leak_detection` flakes, we can use "rerun from failed" in CircleCI without having to first cancel the approval job.
48bdf93 to
31c21e1
Compare
Fixes #766
See individual commits for details.