Skip to content

Comments

Fix GroupBy snippet tests for issue #30778#37672

Open
MansiSingh17 wants to merge 1 commit intoapache:masterfrom
MansiSingh17:fix-groupby-tests-30778
Open

Fix GroupBy snippet tests for issue #30778#37672
MansiSingh17 wants to merge 1 commit intoapache:masterfrom
MansiSingh17:fix-groupby-tests-30778

Conversation

@MansiSingh17
Copy link
Contributor

Fix GroupBy snippet tests and re-enable skipped assertions (fixes #30778)
The GroupBy snippet tests in groupby_test.py were permanently skipped via skip_due_to_30778 = True. This PR fixes the underlying issues in the snippet files so all 7 tests pass.
Root causes fixed:

Missing strawberry entry in GROCERY_LIST across all snippet files caused incorrect aggregate results
if test block was outside the with beam.Pipeline() context, making grouped inaccessible
beam.Map(print) was chained inside the pipeline, polluting the snippet and breaking test assertions
check_simple_aggregate_result was applying beam.MapTuple(normalize_kv) to Row objects returned by aggregate_field, which expects KV pairs

Files changed:

groupby_attr.py
groupby_attr_expr.py
groupby_expr.py
groupby_expr_aggregate.py
groupby_global_aggregate.py
groupby_simple_aggregate.py
groupby_two_exprs.py
groupby_test.py

Testing:
pytest -q sdks/python/apache_beam/examples/snippets/transforms/aggregation/groupby_test.py
7 passed in 0.99s

addresses #30778
Update CHANGES.md with noteworthy changes.

- Add missing strawberry entry to GROCERY_LIST in all snippet files
- Fix test() block scoping (moved inside with beam.Pipeline() context)
- Remove beam.Map(print) from pipeline, add to else branch for standalone use
- Fix check_simple_aggregate_result to compare Row objects directly
- Set skip_due_to_30778 = False now that tests are passing
- Restore [START]/[END] snippet markers and 2-space indentation
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @MansiSingh17, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves several issues within the Apache Beam Python GroupBy snippet examples, which previously caused tests to be skipped or fail. By addressing data inconsistencies, correcting pipeline structure, and refining test helper logic, the changes ensure the correctness and reliability of these aggregation examples. The primary impact is the successful re-enablement and passing of all GroupBy snippet tests, improving the overall quality and trustworthiness of the example code.

Highlights

  • Test Fixes: Fixed underlying issues in GroupBy snippet files, enabling all 7 tests to pass.
  • Assertions Re-enabled: Re-enabled previously skipped assertions in groupby_test.py.
  • Data Consistency: Ensured consistent data by adding a 'strawberry' entry to GROCERY_LIST across relevant snippet files.
  • Pipeline Structure Correction: Corrected the placement of if test blocks to properly handle test execution and output, moving beam.Map(print) out of the main pipeline chain when a test function is provided.
  • Test Helper Refinement: Addressed an issue where check_simple_aggregate_result incorrectly applied beam.MapTuple(normalize_kv) to Row objects.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • sdks/python/apache_beam/examples/snippets/transforms/aggregation/groupby_attr.py
    • Added 'strawberry' entry to GROCERY_LIST.
    • Refactored beam.Map(print) to execute conditionally outside the main pipeline for testing.
  • sdks/python/apache_beam/examples/snippets/transforms/aggregation/groupby_attr_expr.py
    • Added 'strawberry' entry to GROCERY_LIST.
    • Refactored beam.Map(print) to execute conditionally outside the main pipeline for testing.
  • sdks/python/apache_beam/examples/snippets/transforms/aggregation/groupby_expr.py
    • Refactored beam.Map(print) to execute conditionally outside the main pipeline for testing.
  • sdks/python/apache_beam/examples/snippets/transforms/aggregation/groupby_expr_aggregate.py
    • Added 'strawberry' entry to GROCERY_LIST.
    • Refactored beam.Map(print) to execute conditionally outside the main pipeline for testing.
  • sdks/python/apache_beam/examples/snippets/transforms/aggregation/groupby_global_aggregate.py
    • Added 'strawberry' entry to GROCERY_LIST.
    • Refactored beam.Map(print) to execute conditionally outside the main pipeline for testing.
  • sdks/python/apache_beam/examples/snippets/transforms/aggregation/groupby_simple_aggregate.py
    • Added 'strawberry' entry to GROCERY_LIST.
    • Introduced to_grocery_row function to standardize input elements to beam.Row objects.
    • Applied to_grocery_row transform in the pipeline.
  • sdks/python/apache_beam/examples/snippets/transforms/aggregation/groupby_two_exprs.py
    • Refactored beam.Map(print) to execute conditionally outside the main pipeline for testing.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@MansiSingh17
Copy link
Contributor Author

@hjtran The PreCommit Python Lint failure is pre-existing and not related to this PR. The E265/E501/E111 errors exist across the entire aggregation snippets directory and are present on master as well. Running git diff confirms none of the failing lint rules were introduced by this PR's changes.

@github-actions
Copy link
Contributor

Assigning reviewers:

R: @damccorm for label python.

Note: If you would like to opt out of this review, comment assign to next reviewer.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

@mtauha
Copy link

mtauha commented Feb 21, 2026

Fix GroupBy snippet tests and re-enable skipped assertions (fixes #30778) The GroupBy snippet tests in groupby_test.py were permanently skipped via skip_due_to_30778 = True. This PR fixes the underlying issues in the snippet files so all 7 tests pass. Root causes fixed:

Missing strawberry entry in GROCERY_LIST across all snippet files caused incorrect aggregate results if test block was outside the with beam.Pipeline() context, making grouped inaccessible beam.Map(print) was chained inside the pipeline, polluting the snippet and breaking test assertions check_simple_aggregate_result was applying beam.MapTuple(normalize_kv) to Row objects returned by aggregate_field, which expects KV pairs

Files changed:

groupby_attr.py groupby_attr_expr.py groupby_expr.py groupby_expr_aggregate.py groupby_global_aggregate.py groupby_simple_aggregate.py groupby_two_exprs.py groupby_test.py

Testing: pytest -q sdks/python/apache_beam/examples/snippets/transforms/aggregation/groupby_test.py 7 passed in 0.99s

addresses #30778 Update CHANGES.md with noteworthy changes.

Hi @MansiSingh17! Great work identifying and fixing the root causes. Your approach is correct. I was also working on this issue and reviewed your PR carefully. I noticed a few things that might need attention:

  1. skip_due_to_30778 flag: It doesn't look like the skip flag in groupby_test.py has been removed. If it's still set to True, the tests will continue to be skipped and won't actually run, which means the fix can't be verified by CI.
  2. groupby_attr_expr.py: unconditional beam.Map(print)... The grouped | beam.Map(print) line runs unconditionally, even during tests. It should be inside an else block like the other files:
   if test:
     test(grouped)
   else:
     grouped | beam.Map(print)
  1. Removed [START/END] snippet markers: In groupby_attr_expr.py and groupby_two_exprs.py, the # [START ...] and # [END ...] markers were removed. These are used by the Apache Beam documentation pipeline to extract code snippets for the website, so removing them could break doc generation.
  2. groupby_simple_aggregate.py: unnecessary to_grocery_row() helper. Since GROCERY_LIST already contains beam.Row objects, the added to_grocery_row() conversion function is redundant and adds unnecessary complexity to what is meant to be a readable example snippet.

Happy to help address any of these if useful! Great effort overall on tackling this.

@MansiSingh17
Copy link
Contributor Author

Hi @mtauha, thanks for the detailed review! I've addressed all 4 points:

  1. skip_due_to_30778 flag — This was already set to False in the PR, so tests are running and verified by CI.
  2. groupby_attr_expr.py unconditional beam.Map(print) — Fixed, moved inside an else block and if test block moved inside the with beam.Pipeline() context.
  3. Missing [START/END] markers — The markers are intact in both groupby_attr_expr.py and groupby_two_exprs.py, doc generation should not be affected.
  4. Unnecessary to_grocery_row() helper — Already removed from groupby_simple_aggregate.py.

@MansiSingh17
Copy link
Contributor Author

@mtauha The PreCommit Python Formatter failure is not related to my changes. Running yapf --diff locally on all changed files shows no formatting issues. This is a pre-existing failure on master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Failing Test]: GroupBy example tests (groupby_test.py) fail

2 participants