-
Notifications
You must be signed in to change notification settings - Fork 623
FEAT: Anthropic Model-Written Evals Dataset #1170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
FEAT: Anthropic Model-Written Evals Dataset #1170
Conversation
[pull] main from Azure:main
- Implement fetch_sorry_bench_dataset() for sorry-bench/sorry-bench-202503 on Hugging Face - Support filtering by 44 categories and 21 prompt style linguistic mutations - Default to base prompts only - Add unit tests (10 tests) - Add integration test - Update API documentation
…dation - Change prompt_styles (list) to prompt_style (string) for simpler API - Add input validation for categories and prompt_style - Add VALID_CATEGORIES and VALID_PROMPT_STYLES for input validation - Fix exception handling to preserve stack traces - Update tests to match new API (12 tests, all passing)
Implements support for Anthropic's model-written-evals dataset with 154 evaluation datasets across 4 categories. Changes: - Add fetch_anthropic_evals_dataset() function with category filtering - Support for persona, sycophancy, advanced-ai-risk, and winogenerated categories - Fetches directly from GitHub repository using API for file discovery - Add unit tests (12 tests) - Add integration test - Update API docs
|
Thanks for adding these! made a few stylistic suggestions! |
Co-authored-by: hannahwestra25 <[email protected]>
Co-authored-by: hannahwestra25 <[email protected]>
romanlutz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! Looking forward to having this dataset fetcher in PyRIT 🙂
…rated loading has empty prompts winogenerated_occupations.jsonl???
|
@0xm00n btw I just saw Anthropic drop a new dataset: https://www.anthropic.com/news/political-even-handedness Any chance you're interested in contributing a similar fetcher for it as well? It has two prompts per row so it would require a tiny bit of custom handling, but otherwise (more?) straightforward than this one. |
Yup, seems pretty easy. will work on it soon |
…cture Changed fetch_anthropic_evals_dataset() to return QuestionAnsweringDataset instead of SeedDataset, following the QA structure established in PR Azure#894. - Added _parse_answer_choices() to convert answer behaviors into structured QuestionChoice objects with proper indexing - Preserved letter prefixes in choice text - Used answer_matching_behavior as correct_answer for consistency - Special handling for winogenerated datasets with pronoun_options - Removed redundant tests, added critical tests for choice parsing edge cases - Fixed docstrings to comply with ruff linting rules This makes the dataset compatible with QuestionAnsweringBenchmark orchestrator and follows the pattern where choices are structured objects rather than metadata strings.
|
@romanlutz @AdrGav941 I've refactored into the QA structure in the latest commit. let me know :) |
|
The latest iteration looks good to me! If the added unit tests pass this should be good. Thanks for refactoring @0xm00n ! |
|
Yes, all tests pass! |
|
@romanlutz following up here so we can close the PR |
Description
This PR adds support for the Anthropic model-written-evals dataset to PyRIT. The model-written-evals dataset contains 154 evaluation datasets designed to test LLM behaviors across 4 main categories: persona traits, sycophancy, advanced AI risks, and gender bias. The evaluations use language models to automatically generate test cases across multiple behavioral dimensions.
Dataset: https://github.com/anthropics/evals
Associated Paper: https://arxiv.org/abs/2212.09251
Work Completed
fetch_anthropic_evals_dataset()function inanthropic_evals_dataset.pytest_anthropic_evals_dataset.py(12 test cases)test_fetch_datasets.pyapi.rstpyrit/datasets/__init__.pyRelated Issue
Contributes to issue #450