Skip to content

Latest commit

 

History

History
269 lines (216 loc) · 8.87 KB

File metadata and controls

269 lines (216 loc) · 8.87 KB

Diffusion Reward Modeling for text rendering on images ✨

🌟 Main Contributions

1) Text Rendering Dataset

🔗 Diffusion-Reward-Modeling-for-Text-Rendering-Dataset
📊 14,000 curated prompts for text-on-image generation

2) 6 Alignment Pipelines

  • Supervised Fine-Tuning (SFT)
  • Reward Weighted Regression (RWR)
  • Direct Preference Optimization (DPO)
  • DRaFT
  • ReFL
  • GRPO

3) 2 Text Rendering Quality Metrics

  • OCR Accuracy Metric
  • Reward Model Score

📊 Visualizations

SD3.5 Medium base

A superrealistic panda holding a sign that says "I Love SMILES 2025"
  
An asian dragon holding a sign with "Summer of Machine Learning by Skoltech 2025 !"
  
"I love Harbin Institute of Technology" written on a chinese office building

SFT Results

OCR Metric

Reward Metric


RWR Results

OCR Metric

Reward Metric


DPO Results

OCR Metric

Reward Metric


SFT + DPO Results

OCR Metric

Reward Metric


DRaFT Results (Reward only)


ReFL Results (Reward only)


GRPO

Reward Metric

🔤 Text Rendering Quality Metrics

📊 Text Rendering Quality Metric Distributions

📈 Method Comparison

Method comparison metrics

🚀 Usage

SFT:

sh run_train_sd3_sft.sh

RWR:

sh run_train_sd3_rwr.sh

DPO:

sh run_train_sd3_dpo.sh

ReFL:

sh run_train_sd3_refl.sh

DRaFT:

sh run_train_sd3_draft.sh

GRPO:

sh run_train_sd3_grpo.sh

Generate Latents:

generate_visuals_sd3_480p.sh

Encode text prompts into embeddings

generate_text_embeds_sd3.sh

Calculate OCR + Levenstein metric

calculate_levenstein_metric.sh

Calculate Reward metric

calculate_reward_metric.sh

⚠️ Warning

Quality of DRaFT, ReFL and GRPO is worse in the examples we provide than the ones of DPO, this is due to a small batch size in comparison to DPO. DRaFT, ReFL and GRPO quality could be improved by introducing EMA.

📜 Citation

@article{liu2025improving,
  title={Improving video generation with human feedback},
  author={Liu, Jie and Liu, Gongye and Liang, Jiajun and Yuan, Ziyang and Liu, Xiaokun and Zheng, Mingwu and Wu, Xiele and Wang, Qiulin and Qin, Wenyu and Xia, Menghan and others},
  journal={arXiv preprint arXiv:2501.13918},
  year={2025}
}
@article{clark2023directly,
  title={Directly fine-tuning diffusion models on differentiable rewards},
  author={Clark, Kevin and Vicol, Paul and Swersky, Kevin and Fleet, David J},
  journal={arXiv preprint arXiv:2309.17400},
  year={2023}
}
@article{xu2023imagereward,
  title={Imagereward: Learning and evaluating human preferences for text-to-image generation},
  author={Xu, Jiazheng and Liu, Xiao and Wu, Yuchen and Tong, Yuxuan and Li, Qinkai and Ding, Ming and Tang, Jie and Dong, Yuxiao},
  journal={Advances in Neural Information Processing Systems},
  volume={36},
  pages={15903--15935},
  year={2023}
}
@article{liu2025flow,
  title={Flow-grpo: Training flow matching models via online rl},
  author={Liu, Jie and Liu, Gongye and Liang, Jiajun and Li, Yangguang and Liu, Jiaheng and Wang, Xintao and Wan, Pengfei and Zhang, Di and Ouyang, Wanli},
  journal={arXiv preprint arXiv:2505.05470},
  year={2025}
}
@article{gao2025seedream,
  title={Seedream 3.0 technical report},
  author={Gao, Yu and Gong, Lixue and Guo, Qiushan and Hou, Xiaoxia and Lai, Zhichao and Li, Fanshi and Li, Liang and Lian, Xiaochen and Liao, Chao and Liu, Liyang and others},
  journal={arXiv preprint arXiv:2504.11346},
  year={2025}
}

📧 Contact

Supported by: