You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Follow-up of #1133
This is because sending SIGUSR2 to FPM (the previously implemented solution) did not really stop with 100% certainty the PHP script that timed out.
Indeed, it merely interrupted the currently blocked call (e.g. a sleep, a DB call, etc.), flushed the logs and carried on. My guess is that this could have caused the PHP script to continue to run in some cases, possibly running into yet another timeout on a next line (e.g. another DB call).
This PR fixes the timeout test that wasn't really working (🤦) and restarts FPM completely in case of timeout. That is confirmed to completely stop the execution of the timed out script + flush the logs to stderr.
// That causes FPM to reload all workers, which allows us to cleanly stop the FPM worker that is stuck in a timeout/waiting state.
127
-
// A (intentional) side-effect is that it causes all worker logs buffered by FPM to be written to stderr.
128
-
// Without that, all logs written by the PHP script are never written to stderr (and thus never logged to CloudWatch).
129
-
// This takes a bit of time (a few ms), but it's still faster than rebooting FPM entirely.
130
-
posix_kill($this->fpm->getPid(), SIGUSR2);
129
+
echo"The PHP script timed out. Bref will now restart PHP-FPM to start from a clean slate and flush the PHP logs.\nTimeouts can happen for example when trying to connect to a remote API or database, if this happens continuously check for those.\nIf you are using a RDS database, read this: https://bref.sh/docs/environment/database.html#accessing-the-internet\n";
130
+
131
+
/**
132
+
* Restart FPM so that the blocked script is 100% terminated and that its logs are flushed to stderr.
133
+
*
134
+
* - "why restart FPM?": if we don't, the previous request continues to execute on the next request
135
+
* - "why not send a SIGUSR2 signal to FPM?": that was a promising approach because SIGUSR2
136
+
* causes FPM to cleanly stop the FPM worker that is stuck in a timeout/waiting state.
137
+
* It also causes all worker logs buffered by FPM to be written to stderr (great!).
138
+
* This takes a bit of time (a few ms), but it's faster than rebooting FPM entirely.
139
+
* However, the downside is that it doesn't "kill" the previous request execution:
140
+
* it merely stops the execution of the line of code that is waiting (e.g. "sleep()",
141
+
* "file_get_contents()", ...) and continues to the next line. That's super weird!
142
+
* So SIGUSR2 isn't a great solution in the end.
143
+
*/
144
+
$this->stop();
145
+
$this->start();
131
146
132
147
// Throw an exception so that:
133
148
// - this is reported as a Lambda execution error ("error rate" metrics are accurate)
@@ -136,7 +151,7 @@ public function handleRequest(HttpRequestEvent $event, Context $context): HttpRe
136
151
thrownewTimeout($timeoutDelayInMs);
137
152
} catch (Throwable$e) {
138
153
printf(
139
-
"Error communicating with PHP-FPM to read the HTTP response. A root cause of this can be that the Lambda (or PHP) timed out, for example when trying to connect to a remote API or database, if this happens continuously check for those! Bref will restart PHP-FPM now. Original exception message: %s %s\n",
154
+
"Error communicating with PHP-FPM to read the HTTP response. Bref will restart PHP-FPM now. Original exception message: %s %s\n",
0 commit comments