feat: add show_sample_rows tag to override PII-based sample hiding#973
feat: add show_sample_rows tag to override PII-based sample hiding#973
Conversation
When a dimension_anomalies (or volume_anomalies with dimensions) test detects failures across multiple dimension values, the alert description previously only showed the last row's description. Now it shows each failing dimension's details individually (up to 5), or a count summary with a sample of dimension values when more than 5 dimensions fail. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Reformat anomalous_dimensions list to satisfy Black 88-char line limit
- Fix TypeError in adapter_query_runner when Vertica returns Decimal
special values (Infinity/NaN): as_tuple().exponent is a string ('F'/'n')
for those cases, not an int, causing '>= not supported between str and int'
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a new tag mechanism that is the inverse of the existing PII tag behavior. When enable_samples_on_show_sample_rows_tags is true, models or columns tagged with show_sample_rows will have their samples shown even when PII tags would otherwise suppress them. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
👋 @joostboon |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughAdds a feature-flagged flow to preserve sample rows for models with configured "show sample rows" tags, refactors anomaly test result description to summarize anomalous dimension values, adds integration tests for dimension-anomaly descriptions, tightens Decimal serialization for special values, and updates .gitignore. Changes
Sequence Diagram(s)sequenceDiagram
rect rgba(200,200,255,0.5)
participant TestRunner as DbtProject.test
end
rect rgba(200,255,200,0.5)
participant Materialization as test.materialization
participant Macro as is_show_sample_rows_table
participant PII as get_pii_columns_from_parent_model
end
rect rgba(255,200,200,0.5)
participant AnomalyMacro as store_anomaly_test_results
participant Results as TestResults
end
TestRunner->>Materialization: invoke test materialization (flattened_test)
Materialization->>Macro: call is_show_sample_rows_table(flattened_test)
Macro-->>Materialization: true/false
alt show_sample_rows true
Materialization->>PII: preserve sample_limit (skip PII-based disabling)
else show_sample_rows false
Materialization->>PII: evaluate PII tags -> may set sample_limit=0
end
TestRunner->>AnomalyMacro: persist anomaly results
AnomalyMacro->>Results: build test_results_description (anomalous_rows summary)
AnomalyMacro-->>TestRunner: emit results
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@macros/edr/system/system_utils/get_pii_columns_from_parent_model.sql`:
- Around line 41-54: This file was auto-modified by the SQL formatter; run the
project's pre-commit hooks (or directly run sqlfmt) locally and commit the
formatted changes so CI passes. Reformat the macro that defines
enable_show_tags, raw_show_tags, show_tags and the loop over
column_node/all_column_tags_lower to match the repo's sqlfmt/pre-commit rules,
then stage and push the resulting changes.
- Around line 41-46: The current normalization of show_sample_rows_tags
mistakenly treats strings as iterable (so "show" becomes ["s","h","o","w"]);
update the block that sets enable_show_tags / raw_show_tags / show_tags to
follow the same pattern used for pii_tags and get_column_tags: first check if
raw_show_tags is string and wrap it into a single-item list, else if it's
iterable use it, then map to lower and cast to list; adjust the show_tags
assignment (the variable name show_tags and the source raw_show_tags)
accordingly so the downstream intersection check uses a proper list of tag
strings.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 180b1499-7df2-47b3-b07a-5ef781014c1f
📒 Files selected for processing (8)
.gitignoreintegration_tests/tests/adapter_query_runner.pyintegration_tests/tests/test_dimension_anomalies.pymacros/edr/data_monitoring/anomaly_detection/store_anomaly_test_results.sqlmacros/edr/materializations/test/test.sqlmacros/edr/system/system_utils/get_config_var.sqlmacros/edr/system/system_utils/get_pii_columns_from_parent_model.sqlmacros/edr/system/system_utils/is_show_sample_rows_table.sql
macros/edr/system/system_utils/get_pii_columns_from_parent_model.sql
Outdated
Show resolved
Hide resolved
macros/edr/system/system_utils/get_pii_columns_from_parent_model.sql
Outdated
Show resolved
Hide resolved
|
Thanks for the review! Both issues have been addressed:
|
Summary
show_sample_rowstag that is the inverse of the existing PII tag behaviorenable_samples_on_show_sample_rows_tags: trueis set (default:false), models or columns tagged withshow_sample_rowswill have their samples shown even when PII tags would otherwise suppress themdisable_test_samplesat the test level still takes full precedence over this overrideChanges
get_config_var.sql: Addedenable_samples_on_show_sample_rows_tags(default:false) andshow_sample_rows_tags(default:["show_sample_rows"]) config varsis_show_sample_rows_table.sql(new): Macro that checks if a model's tags match anyshow_sample_rows_tags; mirrorsis_pii_table.sqlget_pii_columns_from_parent_model.sql: Skips columns tagged withshow_sample_rowswhen building the PII columns listtest.sql: Wraps theis_pii_table/should_disable_sampling_for_piichecks so they are bypassed when the model has ashow_sample_rowstagTest plan
piiandshow_sample_rows, set both feature flags totrue— confirm samples are returnedpiiandshow_sample_rows, both flags enabled — confirm column appears in samplesenable_samples_on_show_sample_rows_tags: falsewith ashow_sample_rows-tagged model — confirm PII hiding still appliesdisable_samples_on_pii_tags: true, noshow_sample_rowstag — confirm existing behavior unchangedshow_sample_rows, test hasmeta.disable_test_samples: true— confirm samples still hiddentest_sampling_pii.py,test_column_pii_sampling.py,test_disable_samples_config.py🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Bug Fixes
Tests
Chores