Overhaul spawnveg to use clone(2) and fix many related spawning bugs#888
Overhaul spawnveg to use clone(2) and fix many related spawning bugs#888
Conversation
… bugs Alterations: - Add a version of spawnveg that clones processes with clone(2) outright. This method is generally faster than using posix_spawn, and is also more portable (since Linux with musl libc *and* NetBSD both provide this function). The code was initially based on the _real_vfork spawnveg code previously removed in ef9a5c7 (unlike vfork(2), there is no problem with running code prior to execve for clone with CLONE_VM|CLONE_VFORK). - Remove the now dead code implementing tcsetpgrp via posix_spawn_file_actions_addtcsetpgrp_np(). This code was only ever used with glibc 2.35+. (Not only that, but the only other platform that ever considered implementing such a call was NetBSD[^1], where it appears to have been abandoned. This is ironic considering ksh no longer uses posix_spawn on NetBSD as of this commit.) This code may be worth visiting if io_uring_spawn()[^2] is ever actualized (that project appears to be quite inactive though). - Bugfix: Don't set the terminal signals to their defaults in the parent prior to calling spawnveg. This is the primary cause of a lockup in the pty tests which can be reproduced as follows: $ exec ksh $ bin/shtests pty 2>/dev/null ^C ^C ^C (SIGINT doesn't work anymore and may segfault) The correct way to go about dealing with SIGT* is to set those to SIG_DFL in the child process (cf. _sh_fork()). - Added support for tcsetpgrp to the fork fallback in spawnveg. Some form of this appears to have already been attempted in AT&T olden times, but that old version was broken and needed bugfixes desperately. - If the child needs tcsetpgrp, block the terminal signals in the parent process via sigcritical(). The posix_spawn version doesn't need this because posix_spawn will block signals automatically and therefore doesn't need sigcritical. - Now that the fork fallback for spawnveg works correctly in interactive terminals, prefer that to the sh_fork() codepath on operating systems without posix_spawn tcsetpgrp support. Even though the underlying system call is still ultimately fork, the sh_ntfork() codepath is faster than the traditional sh_fork codepath. Benchmark: $ time /tmp/ksh-devbranch -ic "for i in {1..10000}; do $(whence -p true); done" real 0m03.302s user 0m00.988s sys 0m02.320s $ time /tmp/ksh-newspawnveg -ic "for i in {1..10000}; do $(whence -p true); done" real 0m03.160s user 0m01.187s sys 0m01.977s - To that end, split up the spawnveg versions into spawnveg_fast and spawnveg_slow. Choose the appropriate one when spawnveg is called; this removes the need for the xec.c ifdef hackery. - Removed the ntfork_tcpgrp ifdefs from xec.c; spawnveg can handle it by itself now. - Bugfix: With the spawnveg_fast and spawnveg_slow innovation, spawnveg now always has support for setsid. It'll fallback to fork if POSIX_SPAWN_SETSID and clone(2) aren't available. - Bugfix: For the posix_spawn version of spawnveg, the flags should be of the short type pursuant to the POSIX specification. - Optimization: Use pipe2 in the fork fallback for spawnveg when it's available to avoid two fcntl syscalls. - Updated the spawnveg documentation to reflect the new changes. - _spawnveg(): Restore the terminal process group immediately after any failed spawn attempt, rather than only in sh_ntfork(). - Added a regression test for a crash that occurred because of the erroneous tcsetpgrp placement in sh_ntfork(). - path_spawn(): Do not print an error message and longjmp away upon encountering E2BIG or some other error; let the calling functions take care of that. - path_exec(): Added handling for E2BIG. - Fixed a bug that caused sh_ntfork to leak file descriptors upon encountering an error (re: 8f848bc). Reproducer: #!/bin/ksh ls /proc/$$/fd redirect 2>/dev/null touch /tmp/file for i in {1..20}; do command /tmp/file <(echo) done ls /proc/$$/fd - Added regression tests for the leak that use 'command' and 'command -x'. - Fixed an old regression test that was giving false negatives. - Clarified the intent of the sigcrit nesting matches. The ksh2020 devs appeared to have been confused by this line, so some additional clarification explaining the intent should be helpful for posterity. - path.sh: Added tests for the exit status of commands run when forked with the & operator. [^1]: https://blog.netbsd.org/tnf/entry/gsoc_reports_make_system_31 [^2]: https://lwn.net/Articles/908268/
|
I've been testing this PR, and AFAICT it works well. I'm a bit out of my depth here, and I still don't have time in my life to study the intricacies of clone(2), etc., in great detail. One thing that caught my eye is that, in the clone(2) version of spawnveg_fast(): So, AFAICT, that means the child's C stack is part of the parent's C stack, a grandchild's C stack will be part of the grandparent's C stack, etc. Does that not mean that the child stack will necessarily be a lot smaller, and that we're going to run out of parent stack space pretty quickly if children keep spawning children? Also, would this not fail if a shell function tries to run an external command from a sufficiently high level of recursive shell function calls? Or am I missing something here? |
I must indeed be missing something, because launching 8000 recursive ksh processes via the clone(2) version of spawnveg_fast() is working just fine on my Linux arm64 VM. I don't get it; please explain if you can :) Hmmm... yet, the clone(2) man page says: Does this not prohibit allocating the child stack within the parent's stack? It continues: The current PR code (spawnveg.c:162) passes a pointer Is that not out of bounds by one byte? Also, does the above mean that this will fail on Linux on HP PA processors? |
This is not because it's impossible to use clone on those processors, but because I don't have one to test with, so better safe than sorry.
44a6c1f to
6bf4e1b
Compare
|
Regarding stack allocation, I believe
I believe the C standard explicitly allows one byte past for In any case I've pushed a new commit that avoids |
|
The code below might work on HP PA, but I can't test it myself since I don't have an HP PA CPU. diff --git a/src/lib/libast/comp/spawnveg.c b/src/lib/libast/comp/spawnveg.c
index 080762e7a..c26f5fc56 100644
--- a/src/lib/libast/comp/spawnveg.c
+++ b/src/lib/libast/comp/spawnveg.c
@@ -34,10 +34,6 @@
#include <ast_tty.h>
#include <ast_fcntl.h>
-#if _machine_stack_grows_up
-#undef _lib_clone /* Don't use clone on systems where the stack grows upwards (lacks testing) */
-#endif
-
/*
* Set the SID, PGID and TCPGRP in the child process
* after forking.
@@ -155,6 +151,11 @@ spawnveg_fast(const char* path, char* const argv[], char* const envv[], pid_t pg
pid_t pid;
char stack[STACK_SIZE];
struct cargs args;
+#if _machine_stack_grows_up
+ void *stack_top = stack;
+#else
+ void *stack_top = stack+STACK_SIZE;
+#endif
args.path = path;
args.argv = (char**)argv;
@@ -163,7 +164,7 @@ spawnveg_fast(const char* path, char* const argv[], char* const envv[], pid_t pg
args.pgid = pgid;
args.tcfd = tcfd;
sigcritical(SIG_REG_EXEC|SIG_REG_PROC|(tcfd>=0?SIG_REG_TERM:0));
- pid = clone(exec_process, stack+STACK_SIZE, CLONE_VM|CLONE_VFORK|SIGCHLD, &args);
+ pid = clone(exec_process, stack_top, CLONE_VM|CLONE_VFORK|SIGCHLD, &args);
if (pid == -1)
args.err = errno;
else if (args.err) |
This commit is a rewrite of #881. The previous pull request didn't address lingering bugs with
E2BIGhandling and file descriptor leaks, proving unsatisfactory. After support for direct usage ofclonewas added tospawnveg, the previous fixes for theposix_spawnimplementation wound up unused and were scrapped (although those changes do work just fine and can be readded if that's desirable).Alterations:
spawnvegthat creates a new process withclone(2)outright. This method is generally faster than usingposix_spawn(3), and is also more portable (because Linux with musl libc and NetBSD both provideclone, but notposix_spawn_file_actions_addtcsetpgrp_np). This code was initially based on the_real_vforkcode previously removed in ef9a5c7 (unlikevfork(2), there is no problem with doing stuff prior toexecveforclonewithCLONE_VM | CLONE_VFORK).tcsetpgrpviaposix_spawn_file_actions_addtcsetpgrp_np(). This code was only ever used with glibc 2.35+. (Not only that, but the only other platform that ever considered implementing a similar function was NetBSD1, where it appears to have been abandoned. This is ironic considering ksh no longer usesposix_spawnon NetBSD as of this commit.) This code may be worth revisiting ifio_uring_spawn()2 is ever actualized (that project appears to be rather inactive though).Demonstration of the (small) performance improvement garnered from using
clonedirectly rather than viaposix_spawn:spawnveg. This is the primary cause of a lockup in the pty tests which can be reproduced as follows:The correct way to go about dealing with
SIGT*is to set those toSIG_DFLin the child process3.tcsetpgrpto theforkfallback inspawnveg. Some form of this appears to have already been attempted in AT&T olden times, but that old version was broken and needed bugfixes desperately.tcsetpgrp, block the terminal signals in the parent process viasigcritical().forkfallback forspawnvegworks correctly in interactive terminals, prefer that to thesh_fork()codepath on operating systems withoutclonesupport. Even though the underlying system call is still ultimatelyfork, thesh_ntforkcodepath is faster than the traditionalsh_forkcodepath. Benchmark:spawnvegimplementations intospawnveg_fastandspawnveg_slow. Choose the appropriate one whenspawnvegis called; this removes the need for the xec.c ifdef hackery.ntfork_tcpgrpifdefs from xec.c;spawnvegcan handle it by itself now.spawnveg_fastandspawnveg_slowinnovation,spawnvegnow always has support forsetsid. It'll fallback tofork()ifPOSIX_SPAWN_SETSIDandclone()aren't available.posix_spawnversion ofspawnveg, the flags should be of the short type pursuant to the POSIX specification.pipe2in theforkfallback forspawnvegwhen it's available to avoid twofcntlsyscalls.spawnvegdocumentation to reflect the new changes._spawnveg(): Restore the terminal process group immediately after any relevant failed spawn attempt, rather than only insh_ntfork().tcsetpgrpplacement.Reproducer that could cause ksh to prematurely exit (for Linux systems with glibc 2.35+):
path_spawn(): Do not print an error message andlongjmpaway upon encounteringE2BIGor some other error; let the calling functions take care of that.path_exec(): Added handling forE2BIG.sh_ntforkto leak file descriptors upon encountering an error (re: 8f848bc). (This bug/regression was encountered after fixing the badtcsetpgrpusage.) Reproducer:commandandcommand -x.sigcritical()nesting matches. The ksh2020 devs appeared to have been confused by this line of code, so some additional clarification explaining what it does should be helpful for posterity.&operator.Footnotes
https://blog.netbsd.org/tnf/entry/gsoc_reports_make_system_31 ↩
https://lwn.net/Articles/908268/ ↩
cf.
_sh_fork()↩