Skip to content

GH-49534: [R] Implement dplyr recode_values(), replace_values(), and replace_when()#49536

Merged
thisisnic merged 25 commits intoapache:mainfrom
thisisnic:GH-49534-dplyr-recode
Apr 7, 2026
Merged

GH-49534: [R] Implement dplyr recode_values(), replace_values(), and replace_when()#49536
thisisnic merged 25 commits intoapache:mainfrom
thisisnic:GH-49534-dplyr-recode

Conversation

@thisisnic
Copy link
Copy Markdown
Member

@thisisnic thisisnic commented Mar 17, 2026

Rationale for this change

Implement new dplyr functions

What changes are included in this PR?

Implement them

Are these changes tested?

Yeah

Are there any user-facing changes?

Moar functions

AI Use

Code generated using Claude, with plenty of input from me. I've gone through it in detail and refactored lots, but it needs a last pass before it's ready for review.

@thisisnic thisisnic force-pushed the GH-49534-dplyr-recode branch 2 times, most recently from f8ecb8f to 880b775 Compare March 30, 2026 13:21
@thisisnic thisisnic marked this pull request as ready for review April 6, 2026 07:26
@thisisnic thisisnic requested a review from jonkeane as a code owner April 6, 2026 07:26
Copy link
Copy Markdown
Member

@jonkeane jonkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests look good, and so long as CI doesn't bonk I'm good with shipping this. I do think we should keep the validation error when being given a vectorized .default though.

Comment on lines -299 to +300
"`.default` must have size 1, not size 2",
class = "validation_error"
"`case_when\\(\\)` with vectorized `.default` not supported in Arrow",
class = "arrow_not_supported"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not totally sure I think we should make this change. The original message is clearer to me about what needs to happen. I also don't mind that it's a validation_error since that is what it is.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good shout on the contents of the error message, will update to be more recommendationy!

I'm not sure about the type of error though - my reasoning here was that it's a feature which is supported in dplyr but not arrow, which is where we typically use arrow_not_supported, whereas validation_error is more for things that fail in both. I guess it's tricky when size isn't either length 1 or the same length as the input, as then it's wrong in both, but in the bindings for str_sub/substr we use arrow_not_supported.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that's fair. I'm ok with arrow_not_supported + the clearer message about number of arguments

@jonkeane
Copy link
Copy Markdown
Member

jonkeane commented Apr 7, 2026

@github-actions crossbow submit -g r

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting committer review Awaiting committer review labels Apr 7, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

Revision: 825940b

Submitted crossbow builds: ursacomputing/crossbow @ actions-40f5a339fc

Task Status
r-binary-packages GitHub Actions
r-recheck-most GitHub Actions
test-r-alpine-linux-cran GitHub Actions
test-r-arrow-backwards-compatibility GitHub Actions
test-r-depsource-system GitHub Actions
test-r-dev-duckdb GitHub Actions
test-r-devdocs GitHub Actions
test-r-extra-packages GitHub Actions
test-r-fedora-clang GitHub Actions
test-r-gcc-11 GitHub Actions
test-r-gcc-12 GitHub Actions
test-r-install-local GitHub Actions
test-r-install-local-minsizerel GitHub Actions
test-r-linux-as-cran GitHub Actions
test-r-linux-rchk GitHub Actions
test-r-linux-sanitizers GitHub Actions
test-r-linux-valgrind GitHub Actions
test-r-m1-san GitHub Actions
test-r-macos-as-cran GitHub Actions
test-r-offline-maximal GitHub Actions
test-r-ubuntu-22.04 GitHub Actions
test-r-versions GitHub Actions

@github-actions github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting merge Awaiting merge awaiting changes Awaiting changes awaiting change review Awaiting change review labels Apr 7, 2026
@thisisnic thisisnic force-pushed the GH-49534-dplyr-recode branch from 707bf62 to bbb4113 Compare April 7, 2026 15:04
@thisisnic
Copy link
Copy Markdown
Member Author

@github-actions crossbow submit -g r

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

Revision: bbb4113

Submitted crossbow builds: ursacomputing/crossbow @ actions-29b827672d

Task Status
r-binary-packages GitHub Actions
r-recheck-most GitHub Actions
test-r-alpine-linux-cran GitHub Actions
test-r-arrow-backwards-compatibility GitHub Actions
test-r-depsource-system GitHub Actions
test-r-dev-duckdb GitHub Actions
test-r-devdocs GitHub Actions
test-r-extra-packages GitHub Actions
test-r-fedora-clang GitHub Actions
test-r-gcc-11 GitHub Actions
test-r-gcc-12 GitHub Actions
test-r-install-local GitHub Actions
test-r-install-local-minsizerel GitHub Actions
test-r-linux-as-cran GitHub Actions
test-r-linux-rchk GitHub Actions
test-r-linux-sanitizers GitHub Actions
test-r-linux-valgrind GitHub Actions
test-r-m1-san GitHub Actions
test-r-macos-as-cran GitHub Actions
test-r-offline-maximal GitHub Actions
test-r-ubuntu-22.04 GitHub Actions
test-r-versions GitHub Actions

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Apr 7, 2026
@thisisnic
Copy link
Copy Markdown
Member Author

CI failure appears to be unrelated (duckddb installation) so I'll merge

@thisisnic thisisnic merged commit bb4e492 into apache:main Apr 7, 2026
16 checks passed
@thisisnic thisisnic removed the awaiting changes Awaiting changes label Apr 7, 2026
@conbench-apache-arrow
Copy link
Copy Markdown

After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit bb4e492.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 4 possible false positives for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants