After using SimPO, the model generation results contain many loops

Hi, here are my training settings: 

- I used SimPO on the  LLama3.1-8B-Instruct model with the recommended settings of Llama3-8B-Instruct-V2: gamma=10, gamma_beta_ratio=0.3, lr=1e-6
- I trained the model on the Instruction dataset (Long-Alpaca)
- I randomly provided the model with a prompt during the inference, the ground truth should be: `Garden`


Here is the inference result of the vanilla  LLama3.1-8B-Instruct model: `The milk is in the garden.`

However, after fine-tuning with SimPO loss, the reference results seem like this:

```json
"pred": " \n\nThe football is in the hallway.  The apple is in the hallway.  The apple is in the garden. The apple is in the hallway. The football is in the hallway. The apple is in the garden.  The apple is in the hallway. The football is in the kitchen. The football is in the hallway. The apple is in the garden. The apple is in the hallway. The football is in the hallway. The football is in the kitchen. The football is in the"}
```
 
**It contains heavy loops.**

Based on your experience, how can I modify my hyperparameters to avoid this situation?



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

After using SimPO, the model generation results contain many loops #80

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

After using SimPO, the model generation results contain many loops #80

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions