Skip to content

[R] R - FinalizeS3 segfault #50009

@jwijffels

Description

@jwijffels

Describe the bug, including details regarding any error messages, version, and platform.

Hello.

I'm using the R package arrow (24.0.0) on Ubuntu with R 4.6.0. I install arrow basically by having it install with install.packages("arrow") from https://r2u.stat.illinois.edu/ubuntu which gets the deb file r-cran-arrow_24.0.0-1.ca2204.1_amd64.deb

In a process which I start with Rscript myprocess.R at the beginning of the process I use read_parquet("s3://some/path.parquet") to read some parquet data on S3 and at the end of the process I write_parquet("s3://some/otherpath.parquet") and next I quit R by using quit(save = "no"). This runs on a .2xlarge instance with 8 vCPU.

Sometimes the process takes 30 minutes to 40 minutes which works ok. When it take longer e.g. longer than 42 minutes when R quits, it apparently cleans up all the resources meaning somewhere arrow is also cleaned up. This

If the process takes longer than 42 minutes quitting segfaults at FinalizeS3. This never happens when the process takes less than this 42 minutes amount of time.

This is the error message I see in the logs and the R session is killed at quit.

s3_write_rds(trans, s3_path(settings$folder_output$raw, "rawdata.rds"))
[1] "s3://some-whatever-s3-path/rawdata.rds"
quit(save = "no")
Error in FinalizeS3() : ignoring SIGPIPE signal
Calls: -> FinalizeS3
*** caught segfault ***
address 0x30, cause 'memory not mapped'
An irrecoverable exception occurred. R is aborting now ...

My guess is that it's stopped due to this:

arrow/r/src/filesystem.cpp

Lines 352 to 357 in 62cdda9

// [[arrow::export]]
void FinalizeS3() {
#if defined(ARROW_R_WITH_S3)
StopIfNotOk(fs::FinalizeS3());
#endif
}

What are my options to not make arrow kill R?

Component(s)

R

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions