Skip to content

fix: use output for python faithfulness statements#201

Open
Janhavi Sathe (invocan-jsathe) wants to merge 1 commit into
braintrustdata:mainfrom
invocan-jsathe:fix/python-faithfulness-output-statements
Open

fix: use output for python faithfulness statements#201
Janhavi Sathe (invocan-jsathe) wants to merge 1 commit into
braintrustdata:mainfrom
invocan-jsathe:fix/python-faithfulness-output-statements

Conversation

@invocan-jsathe

Copy link
Copy Markdown

Summary

  • Fix a bug in Python Faithfulness where statements were extracted from expected instead of output, causing the scorer to evaluate the wrong text.
  • Update both sync and async Faithfulness paths in py/autoevals/ragas.py to route statement extraction through output.
  • Add a regression test in py/autoevals/test_ragas.py that uses mismatched output/expected and input-driven mocks so the final score depends on correct routing.

Bug

Faithfulness should score whether claims in the generated answer are supported by context.
In Python, it incorrectly passed expected to statement extraction, so claims were taken from ground truth rather than model output.

Fix

  • In Faithfulness._run_eval_async(...): change answer=expected -> answer=output.
  • In Faithfulness._run_eval_sync(...): change answer=expected -> answer=output.
  • Align sync required-field validation with async by requiring output as well.

Test

Added test_faithfulness_extracts_statements_from_output:

  • Uses different output and expected.
  • Mocks extract_statements to derive statements from passed answer.
  • Mocks extract_faithfulness to derive verdicts from context containment.
  • Ensures score behavior reflects correct routing (would fail under old bug).

Validation

  • uv run --extra dev --extra scipy pytest py/autoevals/test_ragas.py -k faithfulness_extracts_statements_from_output passed.
  • Pre-commit hooks pass on commit.

Co-authored-by: Cursor <cursoragent@cursor.com>
@barrettpyke

Copy link
Copy Markdown

Abhijeet Prasad (@AbhiPrasad) - Took a look a this one and it lgtm but wanted to run it past you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants