Skip to content

fix: add CONTROL_FLOW_EXCEPTION_TYPES check to on_tool_error, on_retriever_error, and on_llm_error#1514

Open
vishnumishra wants to merge 9 commits intolangfuse:mainfrom
vishnumishra:fix/control-flow-exception-check-in-error-handlers
Open

fix: add CONTROL_FLOW_EXCEPTION_TYPES check to on_tool_error, on_retriever_error, and on_llm_error#1514
vishnumishra wants to merge 9 commits intolangfuse:mainfrom
vishnumishra:fix/control-flow-exception-check-in-error-handlers

Conversation

@vishnumishra
Copy link
Copy Markdown

@vishnumishra vishnumishra commented Feb 9, 2026

Summary

  • Add CONTROL_FLOW_EXCEPTION_TYPES guard to on_tool_error, on_retriever_error, and on_llm_error in the Langchain callback handler
  • Matches the existing pattern already used by on_chain_error (line 583)
  • LangGraph control-flow exceptions (GraphInterrupt, NodeInterrupt, ParentCommand) are no longer incorrectly marked as ERROR in Langfuse traces

Problem

LangGraph uses exceptions inheriting from GraphBubbleUp for control flow — not actual errors:

GraphBubbleUp (base)
├── GraphInterrupt    — raised by interrupt() for human-in-the-loop
│   └── NodeInterrupt — deprecated, same purpose
└── ParentCommand     — raised by Command() for agent handoffs

CONTROL_FLOW_EXCEPTION_TYPES is already defined and populated with GraphBubbleUp, and on_chain_error correctly checks it. However, on_tool_error, on_retriever_error, and on_llm_error did not, causing any LangGraph tool using interrupt() or Command() to show as a red ERROR in traces.

Handler Had check? After this PR
on_chain_error Yes Yes (unchanged)
on_tool_error No Yes
on_retriever_error No Yes
on_llm_error No Yes

Test plan

  • Verify interrupt() in a LangGraph tool no longer produces ERROR-level observations
  • Verify Command() handoffs in tools no longer produce ERROR-level observations
  • Verify actual tool/retriever/LLM errors still correctly show as ERROR
  • Existing tests pass

Fixes langfuse/langfuse#10962
Fixes langfuse/langfuse#5035


Important

Adds CONTROL_FLOW_EXCEPTION_TYPES check to on_tool_error, on_retriever_error, and on_llm_error in CallbackHandler.py to prevent control-flow exceptions from being marked as errors.

  • Behavior:
    • Add CONTROL_FLOW_EXCEPTION_TYPES check to on_tool_error, on_retriever_error, and on_llm_error in CallbackHandler.py.
    • Prevents control-flow exceptions (GraphInterrupt, NodeInterrupt, ParentCommand) from being marked as ERROR in Langfuse traces.
  • Consistency:
    • Aligns on_tool_error, on_retriever_error, and on_llm_error with existing on_chain_error behavior.
  • Misc:
    • Fixes issues #10962 and #5035.

This description was created by Ellipsis for c445a75. You can customize this summary. It will automatically update as commits are pushed.

Disclaimer: Experimental PR review

Greptile Summary

This PR adds CONTROL_FLOW_EXCEPTION_TYPES guard checks to on_tool_error, on_retriever_error, and on_llm_error in the Langchain CallbackHandler, matching the existing pattern in on_chain_error. LangGraph control-flow exceptions (GraphInterrupt, NodeInterrupt, ParentCommand) that inherit from GraphBubbleUp will now be marked with level="DEFAULT" and preserve their status_message rather than being flagged as ERROR in traces. The previous thread concern about status_message being conditionally cleared (str(error) if level else None) has been addressed — all four handlers now unconditionally set status_message=str(error).

Confidence Score: 5/5

Safe to merge — the fix is minimal, consistent with the existing on_chain_error pattern, and the previous concern about status_message being cleared has been resolved.

All four error handlers are now consistent. The control-flow guard correctly uses level="DEFAULT" with status_message always preserved so interruptions remain visible in traces. No new pre-existing issues were introduced. No P0/P1 findings.

No files require special attention.

Important Files Changed

Filename Overview
langfuse/langchain/CallbackHandler.py Adds CONTROL_FLOW_EXCEPTION_TYPES check to on_retriever_error (line 256), on_tool_error (line 826), and on_llm_error (line 1028), setting level="DEFAULT" with status_message preserved for control-flow exceptions, consistent with on_chain_error.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[LangGraph Tool / Retriever / LLM raises exception] --> B{isinstance error in CONTROL_FLOW_EXCEPTION_TYPES?}
    B -- Yes\nGraphInterrupt / NodeInterrupt / ParentCommand --> C[level = DEFAULT]
    B -- No\nActual error --> D[level = ERROR]
    C --> E[observation.update level=DEFAULT status_message=str error .end]
    D --> F[observation.update level=ERROR status_message=str error .end]
    E --> G[Trace shows neutral DEFAULT level with interrupt reason visible]
    F --> H[Trace shows ERROR level]
Loading

Reviews (2): Last reviewed commit: "Merge branch 'main' into fix/control-flo..." | Re-trigger Greptile

…iever_error, and on_llm_error

LangGraph uses exceptions inheriting from GraphBubbleUp (GraphInterrupt,
NodeInterrupt, ParentCommand) for control flow, not actual errors.
on_chain_error already filters these via CONTROL_FLOW_EXCEPTION_TYPES,
but the other three error handlers did not, causing interrupt() and
Command() calls in tools to be incorrectly marked as ERROR in traces.

Apply the same guard pattern to on_tool_error, on_retriever_error, and
on_llm_error so control-flow exceptions are not reported as errors.

Fixes langfuse/langfuse#10962
Fixes langfuse/langfuse#5035
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Feb 9, 2026

CLA assistant check
All committers have signed the CLA.

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines +258 to +261
if any(isinstance(error, t) for t in CONTROL_FLOW_EXCEPTION_TYPES):
level = None
else:
level = "ERROR"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Control-flow errors lose status

When error is a LangGraph control-flow exception, this sets level=None and also clears status_message (status_message=str(error) if level else None). That removes any indication in the observation that the run ended due to an interrupt/command, which makes these observations look “successfully ended” but empty. If the goal is only to avoid ERROR, consider keeping a non-error status message (or setting a neutral level like "DEFAULT") so users can still see why the run stopped.

Also appears in on_tool_error and on_llm_error (CallbackHandler.py:814-830, CallbackHandler.py:1016-1032).

Copy link
Copy Markdown
Author

@vishnumishra vishnumishra Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A user looking at traces can't distinguish between "tool completed normally" and tool was interrupted for human-in-the-loop

However, this is the exact same pattern on_chain_error already uses (line 583-596), which has been shipping and working. So the behavior is consistent across all four handlers

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that this is a real UX problem. When we set level=None and status_message=None, control-flow interruptions become invisible in traces.

Looking at the code, I can see that on_chain_error (lines ~583-605) sets both level=None AND clears the status message for control-flow exceptions:

status_message=str(error) if level else None  # ← This clears the message when level=None!

This PR applies the same pattern to on_tool_error, on_retriever_error, and on_llm_error, which means control-flow interruptions will have no visible status indication in traces.

The problem: Users can't distinguish between:

  1. Tool completed successfully → level=None, status_message=None
  2. Tool interrupted for human-in-the-loop → level=None, status_message=None (same!)

Suggested fix: Keep the status message but use a neutral level like "DEFAULT" or "WARNING" instead of None, so the interruption reason is visible but not marked as an error:

if any(isinstance(error, t) for t in CONTROL_FLOW_EXCEPTION_TYPES):
    level = "DEFAULT"
    status_message = f"Control flow: {str(error)}"
else:
    level = "ERROR"
    status_message = str(error)

observation = self._detach_observation(run_id)

if observation is not None:
    observation.update(
        level=cast(
            Optional[Literal["DEBUG", "DEFAULT", "WARNING", "ERROR"]],
            level,
        ),
        status_message=status_message,
        ...
    )

This preserves visibility of interrupts/handoffs while avoiding the red ERROR badge. The same pattern should be applied to on_chain_error for consistency across all error handlers.

Copy link
Copy Markdown
Author

@vishnumishra vishnumishra Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the PR, with the required changes.

…xceptions

Instead of setting level=None and clearing status_message for LangGraph
control-flow exceptions, use level="DEFAULT" and keep the status message.
This preserves visibility of interrupt/command events in traces without
marking them as errors. Also applies the same improvement to on_chain_error
for consistency across all four error handlers.
@vishnumishra
Copy link
Copy Markdown
Author

Can we merge this PR?

@vishnumishra vishnumishra reopened this Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants