Skip to content

FEAT Add NVIDIA AI Content Safety Dataset #1057

@chenss3

Description

@chenss3

Is your feature request related to a problem? Please describe.

Add this dataset into PyRIT - it is not currently apart of PyRIT yet.
https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0
NVIDIA's model evaluation for content safety was accomplished using the content moderation benchmarks from this test set.

Describe the solution you'd like

The HuggingFace data set: https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0
The associated paper: https://openreview.net/[pdf](https://openreview.net/pdf?id=0MvGCv35wi)?id=0MvGCv35wi

Additional context

Similar to previous dataset contributions this should live in pyrit.datasets as a "fetch" function. Also, the harm_categories property should be set on each prompt. This huggingface dataset mentions each harm category under the "violated_categories" column.
There are examples of how PyRIT interacts with other datasets here: https://github.com/search?q=repo%3AAzure%2FPyRIT%20%23%20The%20dataset%20sources%20can%20be%20found%20at%3A&type=code

[[Content Warning: Prompts are aimed at provoking the model, and may contain offensive content.]]
Additional Disclaimer: Given the content of these prompts, keep in mind that you may want to check with your relevant legal department before trying them against LLMs.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions