CPOTrainer - Incorrect handling of different length chosen/rejected p… #4639

davmels · 2025-12-08T02:34:16Z

…rompts.

on line 495, zip() function has a flag strict=True, and only after that we calculate the num_diff_len, which we can not reach in case the chosen and rejected prompt lengths differ by 1 (as zip raises an exception). Therefore, we should take only the first prompt_len_input_ids common parts of the prompts, and find the length differences between the prompts afterwards.

What does this PR do?

Fixes # issue with handling different length chosen/rejected prompts.

…rompts. on line 495, zip() function has a flag `strict=True`, and only after that we calculate the num_diff_len, which we can not reach in case the chosen and rejected prompt lengths differ by 1 (as zip raises an exception). Therefore, we should take only the first `prompt_len_input_ids` common parts of the prompts, and find length differences afterwards.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CPOTrainer - Incorrect handling of different length chosen/rejected p… #4639

CPOTrainer - Incorrect handling of different length chosen/rejected p… #4639

Uh oh!

davmels commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

CPOTrainer - Incorrect handling of different length chosen/rejected p… #4639

Are you sure you want to change the base?

CPOTrainer - Incorrect handling of different length chosen/rejected p… #4639

Uh oh!

Conversation

davmels commented Dec 8, 2025

What does this PR do?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant