💥 Use Temporal Failures for Nexus Error Serialization#2773
💥 Use Temporal Failures for Nexus Error Serialization#2773Quinn-With-Two-Ns wants to merge 8 commits intotemporalio:masterfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7b2a0c78ca
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
temporal-sdk/src/main/java/io/temporal/internal/worker/NexusWorker.java
Outdated
Show resolved
Hide resolved
| } | ||
| return Failure.newBuilder() | ||
| .setMessage(failure.getMessage()) | ||
| .setMessage(temporalFailure.getMessage()) |
There was a problem hiding this comment.
Duplicated failure serialization logic in NexusUtil
Low Severity
temporalFailureToNexusFailure and temporalFailureToNexusFailureInfo share nearly identical logic for serializing a Temporal failure to JSON. Both call PROTO_JSON_PRINTER.print() with the same arguments, handle InvalidProtocolBufferException identically, and set the same metadata. Only the return type differs (Failure vs FailureInfo). The JSON serialization logic could be extracted to a shared helper.
d69b39f to
3495885
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 349588549b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
temporal-sdk/src/main/java/io/temporal/internal/worker/NexusWorker.java
Outdated
Show resolved
Hide resolved
| return new HandlerException(info.getType(), cause, retryBehavior); | ||
| if (failure | ||
| .getMessage() | ||
| .startsWith(String.format("handler error (%s)", info.getType()))) { |
There was a problem hiding this comment.
I understand this is here to allow proper decoding of legacy errors, but for which languages exactly? If we mean to support legacy errors coming from Go or others, are we sure the casing returned by .getType() here would match the error type of other SDKs?, or should we make this condition a little bit more resilient?
There was a problem hiding this comment.
Any old failure serialized by Go or Java. This is the handler error type so that should match the Nexus spec and consistent across Go and Java
temporal-sdk/src/main/java/io/temporal/internal/worker/NexusWorker.java
Outdated
Show resolved
Hide resolved
| } | ||
|
|
||
| @SuppressWarnings("deprecation") | ||
| private void sendReply( |
There was a problem hiding this comment.
The structure of this function is very hard to reason about.
There was a problem hiding this comment.
If we do this, are you still concerned about the structure?
There was a problem hiding this comment.
That one is certainly the most patent culprit, but I'd also try to restructure this to avoid mixing happy paths and failure cases.
Like, instead of:
if (taskResponse != null) {
if (!supportTemporalFailure && taskResponse.getStartOperation().hasFailure()) {
// some failure case on old server
reuturn
}
// happy path AND some failure case on newer servers
}
// other failure cases
I'd suggest restructuring the function along the line of:
if (taskResponse != null && !taskResponse.getStartOperation().hasFailure()) {
// Operation succeeded
} else if (taskResponse != null) {
// Operation failed - OperationError
} else if (response.getHandlerException()) {
// Operation failed - HandlerError
} else {
// throw ...
}
There was a problem hiding this comment.
I'm not sure that will really be clearer since we use the same gRPC method to respond to the server ion the first and second case here.
There was a problem hiding this comment.
I have an attempt to refactor to make it clearer let me push and we can discuss
temporal-sdk/src/main/java/io/temporal/internal/worker/NexusWorker.java
Outdated
Show resolved
Hide resolved
|
|
||
| allprojects { | ||
| repositories { | ||
| mavenLocal() |
There was a problem hiding this comment.
SNAPSHOT dependency and mavenLocal not suitable for merge
Medium Severity
mavenLocal() was added to the allprojects repositories block alongside nexusVersion = '0.5.0-SNAPSHOT'. Both are development-time configurations — mavenLocal() can cause non-reproducible builds by resolving artifacts from the local cache, and SNAPSHOT dependencies are inherently unstable. These need to be replaced with a released version before merging.
Additional Locations (1)
VegetarianOrc
left a comment
There was a problem hiding this comment.
LGTM, just a couple of minor things
| String details; | ||
| try { | ||
| details = JSON_PRINTER.print(failure.toBuilder().setMessage("").build()); | ||
| details = PROTO_JSON_PRINTER.print(temporalFailure.toBuilder().setMessage("").build()); |
There was a problem hiding this comment.
In Go & Python, we're clearing the stack trace out of the details as well as the message. Seems like it'd make sense to do it here as well.
| HandlerError.newBuilder() | ||
| .setErrorType(e.getErrorType().toString()) | ||
| .setRetryBehavior(mapRetryBehavior(e.getRetryBehavior())); | ||
| // TODO: check if this works on old server |
There was a problem hiding this comment.
Have we checked this? If so, let's remove this todo


💥 Use Temporal Failures for Nexus Error Serialization. The Java SDK now responds to Nexus operation failures and task failures with plain Temporal Failures. This change also allows
OperationExceptionandHandlerExceptionto have their own message and stacktrace independent of their cause. If the Server does not support the new format the SDK converts back to the old format before sending the response.Requires:
Note
Medium Risk
Touches Nexus error/failure serialization and server reply paths, which can affect cross-version interoperability and how failures are surfaced/attributed. Backward-compat fallback and expanded tests reduce risk but edge cases could still change failure shapes and metrics tags.
Overview
Switches Nexus task failure reporting to the new API format by sending plain Temporal
Failureprotos for both operation failures (StartOperationResponse.failure) and handler failures (RespondNexusTaskFailedRequest.failure), enabling independent message/stacktrace onOperationException/HandlerException.Adds compatibility logic to downgrade to the old
HandlerError/UnsuccessfulOperationErrorformat when the server doesn’t advertiseRequest.Capabilities.temporal_failure_responses(or when forced viatemporal.nexus.forceOldFailureFormat). UpdatesNexusUtilandDefaultFailureConverterto round-trip Temporal failures through Nexus metadata/details and preserve stack traces.Test server and SDK tests are updated/expanded to handle both formats (including forced-old-format suites) and validate metrics tagging for failed vs canceled operations; CI bumps the Temporal CLI version and build config moves
nexusVersionto0.5.0-SNAPSHOT(plusmavenLocal()for resolution).Written by Cursor Bugbot for commit 9545fd2. This will update automatically on new commits. Configure here.