Describe the bug, including details regarding any error messages, version, and platform.
Hello.
I'm using the R package arrow (24.0.0) on Ubuntu with R 4.6.0. I install arrow basically by having it install with install.packages("arrow") from https://r2u.stat.illinois.edu/ubuntu which gets the deb file r-cran-arrow_24.0.0-1.ca2204.1_amd64.deb
In a process which I start with Rscript myprocess.R at the beginning of the process I use read_parquet("s3://some/path.parquet") to read some parquet data on S3 and at the end of the process I write_parquet("s3://some/otherpath.parquet") and next I quit R by using quit(save = "no"). This runs on a .2xlarge instance with 8 vCPU.
Sometimes the process takes 30 minutes to 40 minutes which works ok. When it take longer e.g. longer than 42 minutes when R quits, it apparently cleans up all the resources meaning somewhere arrow is also cleaned up. This
If the process takes longer than 42 minutes quitting segfaults at FinalizeS3. This never happens when the process takes less than this 42 minutes amount of time.
This is the error message I see in the logs and the R session is killed at quit.
s3_write_rds(trans, s3_path(settings$folder_output$raw, "rawdata.rds"))
[1] "s3://some-whatever-s3-path/rawdata.rds"
quit(save = "no")
Error in FinalizeS3() : ignoring SIGPIPE signal
Calls: -> FinalizeS3
*** caught segfault ***
address 0x30, cause 'memory not mapped'
An irrecoverable exception occurred. R is aborting now ...
My guess is that it's stopped due to this:
|
// [[arrow::export]] |
|
void FinalizeS3() { |
|
#if defined(ARROW_R_WITH_S3) |
|
StopIfNotOk(fs::FinalizeS3()); |
|
#endif |
|
} |
What are my options to not make arrow kill R?
Component(s)
R
Describe the bug, including details regarding any error messages, version, and platform.
Hello.
I'm using the R package arrow (24.0.0) on Ubuntu with R 4.6.0. I install arrow basically by having it install with install.packages("arrow") from https://r2u.stat.illinois.edu/ubuntu which gets the deb file r-cran-arrow_24.0.0-1.ca2204.1_amd64.deb
In a process which I start with Rscript myprocess.R at the beginning of the process I use
read_parquet("s3://some/path.parquet")to read some parquet data on S3 and at the end of the process Iwrite_parquet("s3://some/otherpath.parquet")and next I quit R by using quit(save = "no"). This runs on a .2xlarge instance with 8 vCPU.Sometimes the process takes 30 minutes to 40 minutes which works ok. When it take longer e.g. longer than 42 minutes when R quits, it apparently cleans up all the resources meaning somewhere arrow is also cleaned up. This
If the process takes longer than 42 minutes quitting segfaults at FinalizeS3. This never happens when the process takes less than this 42 minutes amount of time.
This is the error message I see in the logs and the R session is killed at quit.
My guess is that it's stopped due to this:
arrow/r/src/filesystem.cpp
Lines 352 to 357 in 62cdda9
What are my options to not make arrow kill R?
Component(s)
R