fix: add CONTROL_FLOW_EXCEPTION_TYPES check to on_tool_error, on_retriever_error, and on_llm_error#1514
Conversation
…iever_error, and on_llm_error LangGraph uses exceptions inheriting from GraphBubbleUp (GraphInterrupt, NodeInterrupt, ParentCommand) for control flow, not actual errors. on_chain_error already filters these via CONTROL_FLOW_EXCEPTION_TYPES, but the other three error handlers did not, causing interrupt() and Command() calls in tools to be incorrectly marked as ERROR in traces. Apply the same guard pattern to on_tool_error, on_retriever_error, and on_llm_error so control-flow exceptions are not reported as errors. Fixes langfuse/langfuse#10962 Fixes langfuse/langfuse#5035
| if any(isinstance(error, t) for t in CONTROL_FLOW_EXCEPTION_TYPES): | ||
| level = None | ||
| else: | ||
| level = "ERROR" |
There was a problem hiding this comment.
Control-flow errors lose status
When error is a LangGraph control-flow exception, this sets level=None and also clears status_message (status_message=str(error) if level else None). That removes any indication in the observation that the run ended due to an interrupt/command, which makes these observations look “successfully ended” but empty. If the goal is only to avoid ERROR, consider keeping a non-error status message (or setting a neutral level like "DEFAULT") so users can still see why the run stopped.
Also appears in on_tool_error and on_llm_error (CallbackHandler.py:814-830, CallbackHandler.py:1016-1032).
There was a problem hiding this comment.
A user looking at traces can't distinguish between "tool completed normally" and tool was interrupted for human-in-the-loop
However, this is the exact same pattern on_chain_error already uses (line 583-596), which has been shipping and working. So the behavior is consistent across all four handlers
There was a problem hiding this comment.
You're right that this is a real UX problem. When we set level=None and status_message=None, control-flow interruptions become invisible in traces.
Looking at the code, I can see that on_chain_error (lines ~583-605) sets both level=None AND clears the status message for control-flow exceptions:
status_message=str(error) if level else None # ← This clears the message when level=None!This PR applies the same pattern to on_tool_error, on_retriever_error, and on_llm_error, which means control-flow interruptions will have no visible status indication in traces.
The problem: Users can't distinguish between:
- Tool completed successfully →
level=None,status_message=None - Tool interrupted for human-in-the-loop →
level=None,status_message=None(same!)
Suggested fix: Keep the status message but use a neutral level like "DEFAULT" or "WARNING" instead of None, so the interruption reason is visible but not marked as an error:
if any(isinstance(error, t) for t in CONTROL_FLOW_EXCEPTION_TYPES):
level = "DEFAULT"
status_message = f"Control flow: {str(error)}"
else:
level = "ERROR"
status_message = str(error)
observation = self._detach_observation(run_id)
if observation is not None:
observation.update(
level=cast(
Optional[Literal["DEBUG", "DEFAULT", "WARNING", "ERROR"]],
level,
),
status_message=status_message,
...
)This preserves visibility of interrupts/handoffs while avoiding the red ERROR badge. The same pattern should be applied to on_chain_error for consistency across all error handlers.
There was a problem hiding this comment.
I have updated the PR, with the required changes.
…xceptions Instead of setting level=None and clearing status_message for LangGraph control-flow exceptions, use level="DEFAULT" and keep the status message. This preserves visibility of interrupt/command events in traces without marking them as errors. Also applies the same improvement to on_chain_error for consistency across all four error handlers.
|
Can we merge this PR? |
Summary
CONTROL_FLOW_EXCEPTION_TYPESguard toon_tool_error,on_retriever_error, andon_llm_errorin the Langchain callback handleron_chain_error(line 583)GraphInterrupt,NodeInterrupt,ParentCommand) are no longer incorrectly marked asERRORin Langfuse tracesProblem
LangGraph uses exceptions inheriting from
GraphBubbleUpfor control flow — not actual errors:CONTROL_FLOW_EXCEPTION_TYPESis already defined and populated withGraphBubbleUp, andon_chain_errorcorrectly checks it. However,on_tool_error,on_retriever_error, andon_llm_errordid not, causing any LangGraph tool usinginterrupt()orCommand()to show as a red ERROR in traces.on_chain_erroron_tool_erroron_retriever_erroron_llm_errorTest plan
interrupt()in a LangGraph tool no longer produces ERROR-level observationsCommand()handoffs in tools no longer produce ERROR-level observationsFixes langfuse/langfuse#10962
Fixes langfuse/langfuse#5035
Important
Adds
CONTROL_FLOW_EXCEPTION_TYPEScheck toon_tool_error,on_retriever_error, andon_llm_errorinCallbackHandler.pyto prevent control-flow exceptions from being marked as errors.CONTROL_FLOW_EXCEPTION_TYPEScheck toon_tool_error,on_retriever_error, andon_llm_errorinCallbackHandler.py.GraphInterrupt,NodeInterrupt,ParentCommand) from being marked asERRORin Langfuse traces.on_tool_error,on_retriever_error, andon_llm_errorwith existingon_chain_errorbehavior.This description was created by
for c445a75. You can customize this summary. It will automatically update as commits are pushed.
Disclaimer: Experimental PR review
Greptile Summary
This PR adds
CONTROL_FLOW_EXCEPTION_TYPESguard checks toon_tool_error,on_retriever_error, andon_llm_errorin the LangchainCallbackHandler, matching the existing pattern inon_chain_error. LangGraph control-flow exceptions (GraphInterrupt,NodeInterrupt,ParentCommand) that inherit fromGraphBubbleUpwill now be marked withlevel="DEFAULT"and preserve theirstatus_messagerather than being flagged asERRORin traces. The previous thread concern aboutstatus_messagebeing conditionally cleared (str(error) if level else None) has been addressed — all four handlers now unconditionally setstatus_message=str(error).Confidence Score: 5/5
Safe to merge — the fix is minimal, consistent with the existing on_chain_error pattern, and the previous concern about status_message being cleared has been resolved.
All four error handlers are now consistent. The control-flow guard correctly uses level="DEFAULT" with status_message always preserved so interruptions remain visible in traces. No new pre-existing issues were introduced. No P0/P1 findings.
No files require special attention.
Important Files Changed
Flowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD A[LangGraph Tool / Retriever / LLM raises exception] --> B{isinstance error in CONTROL_FLOW_EXCEPTION_TYPES?} B -- Yes\nGraphInterrupt / NodeInterrupt / ParentCommand --> C[level = DEFAULT] B -- No\nActual error --> D[level = ERROR] C --> E[observation.update level=DEFAULT status_message=str error .end] D --> F[observation.update level=ERROR status_message=str error .end] E --> G[Trace shows neutral DEFAULT level with interrupt reason visible] F --> H[Trace shows ERROR level]Reviews (2): Last reviewed commit: "Merge branch 'main' into fix/control-flo..." | Re-trigger Greptile