Skip to content

feat: add show_sample_rows tag to override PII-based sample hiding#973

Open
joostboon wants to merge 7 commits intomasterfrom
feat/add-show-sample-rows-tag
Open

feat: add show_sample_rows tag to override PII-based sample hiding#973
joostboon wants to merge 7 commits intomasterfrom
feat/add-show-sample-rows-tag

Conversation

@joostboon
Copy link
Contributor

@joostboon joostboon commented Mar 22, 2026

Summary

  • Adds a new show_sample_rows tag that is the inverse of the existing PII tag behavior
  • When enable_samples_on_show_sample_rows_tags: true is set (default: false), models or columns tagged with show_sample_rows will have their samples shown even when PII tags would otherwise suppress them
  • disable_test_samples at the test level still takes full precedence over this override

Changes

  • get_config_var.sql: Added enable_samples_on_show_sample_rows_tags (default: false) and show_sample_rows_tags (default: ["show_sample_rows"]) config vars
  • is_show_sample_rows_table.sql (new): Macro that checks if a model's tags match any show_sample_rows_tags; mirrors is_pii_table.sql
  • get_pii_columns_from_parent_model.sql: Skips columns tagged with show_sample_rows when building the PII columns list
  • test.sql: Wraps the is_pii_table / should_disable_sampling_for_pii checks so they are bypassed when the model has a show_sample_rows tag

Test plan

  • Tag a model with both pii and show_sample_rows, set both feature flags to true — confirm samples are returned
  • Tag a column with both pii and show_sample_rows, both flags enabled — confirm column appears in samples
  • Set enable_samples_on_show_sample_rows_tags: false with a show_sample_rows-tagged model — confirm PII hiding still applies
  • Only disable_samples_on_pii_tags: true, no show_sample_rows tag — confirm existing behavior unchanged
  • Both flags enabled, model tagged show_sample_rows, test has meta.disable_test_samples: true — confirm samples still hidden
  • Run existing PII sampling tests: test_sampling_pii.py, test_column_pii_sampling.py, test_disable_samples_config.py

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Configurable tags and a visibility check to allow overriding PII-based sampling restrictions.
  • Bug Fixes

    • Anomaly descriptions now list anomalous dimension values and summarize large groups.
    • Decimal serialization handles special numeric values (Infinity/NaN) correctly.
  • Tests

    • Added integration tests validating anomaly description behavior for few vs many failures.
  • Chores

    • Added .claude/ to ignore patterns.

joostboon and others added 4 commits March 13, 2026 09:07
When a dimension_anomalies (or volume_anomalies with dimensions) test
detects failures across multiple dimension values, the alert description
previously only showed the last row's description. Now it shows each
failing dimension's details individually (up to 5), or a count summary
with a sample of dimension values when more than 5 dimensions fail.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Reformat anomalous_dimensions list to satisfy Black 88-char line limit
- Fix TypeError in adapter_query_runner when Vertica returns Decimal
  special values (Infinity/NaN): as_tuple().exponent is a string ('F'/'n')
  for those cases, not an int, causing '>= not supported between str and int'

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a new tag mechanism that is the inverse of the existing PII tag
behavior. When enable_samples_on_show_sample_rows_tags is true, models
or columns tagged with show_sample_rows will have their samples shown
even when PII tags would otherwise suppress them.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Contributor

👋 @joostboon
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in the elementary repository.

@coderabbitai
Copy link

coderabbitai bot commented Mar 22, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bdd75e18-3a14-42dc-9c1b-7d2783b63688

📥 Commits

Reviewing files that changed from the base of the PR and between 8ad4467 and 37c9701.

📒 Files selected for processing (2)
  • macros/edr/system/system_utils/get_pii_columns_from_parent_model.sql
  • macros/edr/system/system_utils/is_show_sample_rows_table.sql
🚧 Files skipped from review as they are similar to previous changes (2)
  • macros/edr/system/system_utils/get_pii_columns_from_parent_model.sql
  • macros/edr/system/system_utils/is_show_sample_rows_table.sql

📝 Walkthrough

Walkthrough

Adds a feature-flagged flow to preserve sample rows for models with configured "show sample rows" tags, refactors anomaly test result description to summarize anomalous dimension values, adds integration tests for dimension-anomaly descriptions, tightens Decimal serialization for special values, and updates .gitignore.

Changes

Cohort / File(s) Summary
Configuration
macros/edr/system/system_utils/get_config_var.sql
Adds enable_samples_on_show_sample_rows_tags (default false) and show_sample_rows_tags (default ["show_sample_rows"]).
Show-sample-rows feature
macros/edr/system/system_utils/is_show_sample_rows_table.sql, macros/edr/system/system_utils/get_pii_columns_from_parent_model.sql, macros/edr/materializations/test/test.sql
New macro is_show_sample_rows_table(flattened_test); PII column detection respects the show-sample-tags feature (skips columns when matched); test materialization gate preserves sample_limit when is_show_sample_rows_table is true, avoiding PII-based sample disabling.
Anomaly description logic
macros/edr/data_monitoring/anomaly_detection/store_anomaly_test_results.sql
Rebuilds test_results_description from collected anomalous_rows, summarizing up to 5 anomalous dimension_values (with remaining count) or concatenating per-row descriptions when ≤5; falls back to last scored row or default message when appropriate.
Tests & serialization
integration_tests/tests/test_dimension_anomalies.py, integration_tests/tests/adapter_query_runner.py
Adds two integration tests checking alert descriptions for dimension anomalies (≤5 vs >5); updates _serialize_value() to convert Decimal special normalized exponents (Infinity/NaN) to float and preserve previous int/float mapping otherwise.
Misc
.gitignore
Adds .claude/ to ignore patterns.

Sequence Diagram(s)

sequenceDiagram
  rect rgba(200,200,255,0.5)
    participant TestRunner as DbtProject.test
  end
  rect rgba(200,255,200,0.5)
    participant Materialization as test.materialization
    participant Macro as is_show_sample_rows_table
    participant PII as get_pii_columns_from_parent_model
  end
  rect rgba(255,200,200,0.5)
    participant AnomalyMacro as store_anomaly_test_results
    participant Results as TestResults
  end

  TestRunner->>Materialization: invoke test materialization (flattened_test)
  Materialization->>Macro: call is_show_sample_rows_table(flattened_test)
  Macro-->>Materialization: true/false
  alt show_sample_rows true
    Materialization->>PII: preserve sample_limit (skip PII-based disabling)
  else show_sample_rows false
    Materialization->>PII: evaluate PII tags -> may set sample_limit=0
  end
  TestRunner->>AnomalyMacro: persist anomaly results
  AnomalyMacro->>Results: build test_results_description (anomalous_rows summary)
  AnomalyMacro-->>TestRunner: emit results
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I hopped through tags both bright and small,
I kept sample rows for those who call,
Counted anomalies, five then more,
Wrote tests and tidy decimal lore,
A little rabbit cheers these changes — cottontail encore! 🥕

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the primary feature being added: a new 'show_sample_rows' tag mechanism that can override PII-based sample hiding, which is the central change across all modified files.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/add-show-sample-rows-tag

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@macros/edr/system/system_utils/get_pii_columns_from_parent_model.sql`:
- Around line 41-54: This file was auto-modified by the SQL formatter; run the
project's pre-commit hooks (or directly run sqlfmt) locally and commit the
formatted changes so CI passes. Reformat the macro that defines
enable_show_tags, raw_show_tags, show_tags and the loop over
column_node/all_column_tags_lower to match the repo's sqlfmt/pre-commit rules,
then stage and push the resulting changes.
- Around line 41-46: The current normalization of show_sample_rows_tags
mistakenly treats strings as iterable (so "show" becomes ["s","h","o","w"]);
update the block that sets enable_show_tags / raw_show_tags / show_tags to
follow the same pattern used for pii_tags and get_column_tags: first check if
raw_show_tags is string and wrap it into a single-item list, else if it's
iterable use it, then map to lower and cast to list; adjust the show_tags
assignment (the variable name show_tags and the source raw_show_tags)
accordingly so the downstream intersection check uses a proper list of tag
strings.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 180b1499-7df2-47b3-b07a-5ef781014c1f

📥 Commits

Reviewing files that changed from the base of the PR and between 7a2b542 and 5b68070.

📒 Files selected for processing (8)
  • .gitignore
  • integration_tests/tests/adapter_query_runner.py
  • integration_tests/tests/test_dimension_anomalies.py
  • macros/edr/data_monitoring/anomaly_detection/store_anomaly_test_results.sql
  • macros/edr/materializations/test/test.sql
  • macros/edr/system/system_utils/get_config_var.sql
  • macros/edr/system/system_utils/get_pii_columns_from_parent_model.sql
  • macros/edr/system/system_utils/is_show_sample_rows_table.sql

@joostboon
Copy link
Contributor Author

Thanks for the review! Both issues have been addressed:

  • sqlfmt formatting: applied and committed
  • String iterable bug: fixed in both is_show_sample_rows_table.sql and get_pii_columns_from_parent_model.sql — now using the same is string guard pattern as the existing pii_tags normalization to avoid strings being split into individual characters

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant