-
Notifications
You must be signed in to change notification settings - Fork 623
Description
Is your feature request related to a problem? Please describe.
Add this dataset into PyRIT - it is not currently apart of PyRIT yet.
https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0
NVIDIA's model evaluation for content safety was accomplished using the content moderation benchmarks from this test set.
Describe the solution you'd like
The HuggingFace data set: https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0
The associated paper: https://openreview.net/[pdf](https://openreview.net/pdf?id=0MvGCv35wi)?id=0MvGCv35wi
Additional context
Similar to previous dataset contributions this should live in pyrit.datasets as a "fetch" function. Also, the harm_categories property should be set on each prompt. This huggingface dataset mentions each harm category under the "violated_categories" column.
There are examples of how PyRIT interacts with other datasets here: https://github.com/search?q=repo%3AAzure%2FPyRIT%20%23%20The%20dataset%20sources%20can%20be%20found%20at%3A&type=code
[[Content Warning: Prompts are aimed at provoking the model, and may contain offensive content.]]
Additional Disclaimer: Given the content of these prompts, keep in mind that you may want to check with your relevant legal department before trying them against LLMs.