Skip to content

Parallelize LLM as Judge #3958

@allenli873

Description

@allenli873

Is your feature request related to a problem? Please describe.
Right now, LLM as judge is run in series. This is N * M LLM calls in series, where N is # samples and M is # eval cases. This part can take a long time.

Describe the solution you'd like
I would like it to be possible for these to run in parallel, either by default or via a flag that can be passed into the agent evaluator.

Additional context
I've done a monkey patch of this in my own project. Before it took close to 5 minutes for me to eval one test case with 5 samples & 2 different rubric numbers with gemini 3 pro. After the patch it was down to 1 minute.

Metadata

Metadata

Assignees

Labels

eval[Component] This issue is related to evaluationneeds review[Status] The PR/issue is awaiting review from the maintainer

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions