Skip to content

Commit 0617445

Browse files
[Fix][Core] Close unused pipe file descriptor of child processes of Raylet (#52700)
- We rely on a pipe-based mechanism to monitor the parent process (raylet)'s health: https://github.com/ray-project/ray/blob/07a00f20bb830c61ccf8fb2fddcfae8fa4c418dd/python/ray/_private/process_watcher.py#L127-L149 - Ideally, when raylet dies, the OS will close all its file descriptors. With no process holding the write end of the pipe, so the `readline()` in https://github.com/ray-project/ray/blob/07a00f20bb830c61ccf8fb2fddcfae8fa4c418dd/python/ray/_private/process_watcher.py#L138 will return an empty string, indicating that the raylet has exited. - However, this only works if *no other process* holds the pipe's write end. - Raylet launches several child processes via `fork + exec`. For example, `DashboardAgent` and `RuntimeEnvAgent`: https://github.com/ray-project/ray/blob/c2a6de384217e5db6f055d0607ca3e531deed56c/src/ray/raylet/node_manager.cc#L392-L393 - These child processes are spawned using `ProcessFD::spawnvpe`: https://github.com/ray-project/ray/blob/a80f02f2c1c53d79eb94ecafaf59ae51f87d0734/src/ray/util/process.cc#L130 This function sets up a pipe and closes unused ends in both parent and child: https://github.com/ray-project/ray/blob/6ca0da17300e6087ae6bc9cb9faaa7e9a071b33e/src/ray/util/process.cc#L178-L226 - Raylet (parent process) creates `DashboardAgent` first, holding the write end (`parent_lifetime_pipe[1]`, say fd 40), and closes the read end. - The dashboard agent (child process 1) holds the read end (`parent_lifetime_pipe[0]`, say fd 39), and closes the write end. - When raylet later forks `RuntimeEnvAgent` (child process 2), all open fds (including fd 40) are inherited by the new child process. - As a result, even if raylet dies, `RuntimeEnvAgent` still holds fd 40 (write end), preventing `DashboardAgent` from detecting raylet's death. - Therefore, `readline()` in the dashboard agent hangs forever instead of returning an empty string. **Solution:** Ensure unused pipe write ends are closed on `exec` by setting `FD_CLOEXEC`. ### Changes in this PR - Revert the changes in #52388 because it is only a hotfix. - Move `SetFdCloseOnExec` to `process.h` and `process.cc` and make it a public util function. - Call `SetFdCloseOnExec` in the parent process to make sure the pipe fd is closed in subsequent `exec`. ### Testing Tested manually. Before this PR, in about 20% of the runs the agent process did not exit after the raylet died. After this PR, I ran it 20 times consecutively and the agent process successfully exited in all runs. --------- Signed-off-by: Chi-Sheng Liu <[email protected]> Co-authored-by: Edward Oakes <[email protected]>
1 parent 2bcea58 commit 0617445

File tree

4 files changed

+36
-39
lines changed

4 files changed

+36
-39
lines changed

python/ray/dashboard/modules/job/tests/test_sdk.py

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -197,9 +197,7 @@ def get_register_agents_number(gcs_client):
197197
],
198198
indirect=True,
199199
)
200-
def test_job_head_choose_job_agent_E2E(
201-
ray_start_cluster_head_with_env_vars, call_ray_stop_only
202-
):
200+
def test_job_head_choose_job_agent_E2E(ray_start_cluster_head_with_env_vars):
203201
cluster = ray_start_cluster_head_with_env_vars
204202
assert wait_until_server_available(cluster.webui_url) is True
205203
webui_url = cluster.webui_url

src/ray/common/client_connection.cc

Lines changed: 5 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@
3030

3131
#include "ray/common/event_stats.h"
3232
#include "ray/common/ray_config.h"
33+
#include "ray/util/process.h"
3334
#include "ray/util/util.h"
3435

3536
#if defined(_WIN32)
@@ -42,48 +43,16 @@
4243

4344
namespace ray {
4445

45-
namespace {
46-
47-
#if defined(_WIN32)
48-
// Don't care what exact type is in windows... Looks like to be an asio specific type.
49-
template <typename NativeHandleType>
50-
void SetFdCloseOnExec(const NativeHandleType &handle) {
51-
// In Windows we don't need to do anything, because in CreateProcess we pass
52-
// bInheritHandles = false which means we don't inherit handles or sockets.
53-
// https://github.com/ray-project/ray/blob/928183b3acab3c4ad73ef3001203a7aaf009bc87/src/ray/util/process.cc#L148
54-
// https://learn.microsoft.com/en-us/windows/win32/sysinfo/handle-inheritance
55-
return;
56-
}
57-
#else
58-
59-
// Sets the flag FD_CLOEXEC to a file descriptor.
60-
// This means when the process is forked, this fd would be closed in the child process
61-
// side. Raylet forks to create core workers and we don't want the Unix Socket FDs to be
62-
// inherited by the core workers. Leaking these FDs would have performance implications.
63-
//
64-
// Idempotent. Calling twice == calling once.
65-
// Not thread safe.
66-
// See https://github.com/ray-project/ray/issues/40813
67-
void SetFdCloseOnExec(int fd) {
68-
if (fd < 0) {
69-
return;
70-
}
71-
int flags = fcntl(fd, F_GETFD, 0);
72-
RAY_CHECK_NE(flags, -1) << "fcntl error: errno = " << errno << ", fd = " << fd;
73-
const int ret = fcntl(fd, F_SETFD, flags | FD_CLOEXEC);
74-
RAY_CHECK_NE(ret, -1) << "fcntl error: errno = " << errno << ", fd = " << fd;
75-
RAY_LOG(DEBUG) << "set FD_CLOEXEC to fd " << fd;
76-
}
77-
#endif
78-
79-
} // namespace
80-
8146
void SetCloseOnExec(local_stream_socket &socket) {
47+
#if !defined(_WIN32)
8248
SetFdCloseOnExec(socket.native_handle());
49+
#endif
8350
}
8451

8552
void SetCloseOnExec(boost::asio::basic_socket_acceptor<local_stream_protocol> &acceptor) {
53+
#if !defined(_WIN32)
8654
SetFdCloseOnExec(acceptor.native_handle());
55+
#endif
8756
}
8857

8958
Status ConnectSocketRetry(local_stream_socket &socket,

src/ray/util/process.cc

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@
2222
#include <Winternl.h>
2323
#include <process.h>
2424
#else
25+
#include <fcntl.h>
2526
#include <poll.h>
2627
#include <signal.h>
2728
#include <stddef.h>
@@ -68,6 +69,19 @@ int execvpe(const char *program, char *const argv[], char *const envp[]) {
6869

6970
namespace ray {
7071

72+
#if !defined(_WIN32)
73+
void SetFdCloseOnExec(int fd) {
74+
if (fd < 0) {
75+
return;
76+
}
77+
int flags = fcntl(fd, F_GETFD, 0);
78+
RAY_CHECK_NE(flags, -1) << "fcntl error: errno = " << errno << ", fd = " << fd;
79+
const int ret = fcntl(fd, F_SETFD, flags | FD_CLOEXEC);
80+
RAY_CHECK_NE(ret, -1) << "fcntl error: errno = " << errno << ", fd = " << fd;
81+
RAY_LOG(DEBUG) << "set FD_CLOEXEC to fd " << fd;
82+
}
83+
#endif
84+
7185
bool EnvironmentVariableLess::operator()(char a, char b) const {
7286
// TODO(mehrdadn): This is only used on Windows due to current lack of Unicode support.
7387
// It should be changed when Process adds Unicode support on Windows.
@@ -202,6 +216,8 @@ class ProcessFD {
202216
if (pid != 0 && pipefds[1] != -1) {
203217
close(pipefds[1]); // not the child, so close the write end of the pipe
204218
pipefds[1] = -1;
219+
// make sure the read end of the pipe is closed on exec
220+
SetFdCloseOnExec(pipefds[0]);
205221
}
206222

207223
// Create a pipe and redirect the read pipe to a child's stdin.
@@ -213,11 +229,14 @@ class ProcessFD {
213229
// Child. Close sthe write end of the pipe from child.
214230
close(parent_lifetime_pipe[1]);
215231
parent_lifetime_pipe[1] = -1;
232+
SetFdCloseOnExec(parent_lifetime_pipe[0]);
216233
}
217234
if (pid != 0 && parent_lifetime_pipe[0] != -1) {
218235
// Parent. Close the read end of the pipe.
219236
close(parent_lifetime_pipe[0]);
220237
parent_lifetime_pipe[0] = -1;
238+
// Make sure the write end of the pipe is closed on exec.
239+
SetFdCloseOnExec(parent_lifetime_pipe[1]);
221240
}
222241
} else {
223242
// parent_lifetime_pipe pipes are not used.

src/ray/util/process.h

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,17 @@ enum { PID_MAX_LIMIT = 1 << 22 };
4040

4141
namespace ray {
4242

43+
#if !defined(_WIN32)
44+
/// Sets the FD_CLOEXEC flag on a file descriptor.
45+
/// This means when the process is forked, this fd would be closed in the child process
46+
/// side.
47+
///
48+
/// Idempotent.
49+
/// Not thread safe.
50+
/// See https://github.com/ray-project/ray/issues/40813
51+
void SetFdCloseOnExec(int fd);
52+
#endif
53+
4354
class EnvironmentVariableLess {
4455
public:
4556
bool operator()(char a, char b) const;

0 commit comments

Comments
 (0)