Add consumed_args callback optimization and apply to array_reduce() for carry to improve performance#21340
Add consumed_args callback optimization and apply to array_reduce() for carry to improve performance#21340drealecs wants to merge 2 commits intophp:masterfrom
Conversation
86f212d to
2f9aa58
Compare
2f9aa58 to
3299db5
Compare
|
Can you provide the benchmark you used to verify this improves the situation? |
Yes, this is a good point. <?php
declare(strict_types=1);
function test($size) {
$data = [];
for ($i = 0; $i < $size; $i++) {
$data[] = substr(hash('crc32b', (string) $i), 0, 4);
}
$start = hrtime(true);
$result = array_reduce(
$data,
static function ($carry, $item) {
@$carry[$item]++;
return $carry;
},
[],
);
$duration = hrtime(true) - $start;
printf("size=%d, unique=%d, duration=%.3f ms\n", $size, count($result), $duration/1_000_000);
}
test(10_000);
test(20_000);
test(40_000);On master, this is the output: $ ./sapi/cli/php test_array_reduce.php
size=10000, unique=9980, duration=1199.907 ms
size=20000, unique=19084, duration=4844.224 ms
size=40000, unique=34066, duration=19404.972 msand we can see that the complexity is O(n^2). On this branch, this is the output: $ ./sapi/cli/php test_array_reduce.php
size=10000, unique=9980, duration=9.108 ms
size=20000, unique=19084, duration=18.905 ms
size=40000, unique=34066, duration=35.392 mswith O(n) complexity. |
3299db5 to
cdbe42a
Compare
iluuu1994
left a comment
There was a problem hiding this comment.
Nice proposal! The approach makes sense to me. Applying this to all cases will require some care. This is safe for any argument that is owned. So this is fine for $initial (and the return value of $callback in subsequent iterations), but wouldn't be for the elements $array (ZEND_HASH_FOREACH_VAL(htbl, operand)) given operand has not been ADDREFd at this point.
Before merging this, it may make sense to scout the code base to see which functions this may be applied to. Some of the use cases might influence the design.
Zend/zend_API.h
Outdated
| * integrate APIs like call_user_func_array(). The usual restriction that | ||
| * there may not be position arguments after named arguments applies. */ | ||
| HashTable *named_params; | ||
| uint32_t consume_params; |
There was a problem hiding this comment.
Nit: consumed would indicate this property can diverge between args. args would also be more correct, though I realize you named this after the other fields.
There was a problem hiding this comment.
Yes, I got to the same conclusion, that args might be a better name here, but I like consistency more, so I went with params.
Also, instead of consume I was thinking some other words might fit better: move, steal, transfer.
For now I'll stick with consume_params, if nothing comes up as a better alternative.
There was a problem hiding this comment.
My comment wasn't fully clear. I'd suggest consumed_args, which indicates better that this is map of args to be consumed. consume_args sounds like a bool to consume all args.
There was a problem hiding this comment.
Yeah agreed, the current name is confusing.
There was a problem hiding this comment.
Also this should be moved before the named_params field as there is already a 4 bytes gap in the struct on 64bits
There was a problem hiding this comment.
Thank you for the review! Great point, I moved it after param_count.
And also renamed it to consumed_args.
I already did some analysis, and there are a few other places where we could implement it to get some improvements:
What do you think? And related to the design, what can we improve? |
Girgias
left a comment
There was a problem hiding this comment.
I'd rather have this split into two PRs. One that adds the consumed_args with various tests added to the zend_test extension (e.g. ref variables as that seems to be explicitly checked and I'd like to understand more) and then a follow-up PR to add support for array_reduce() and whatever other functions.
Zend/zend_API.h
Outdated
| * integrate APIs like call_user_func_array(). The usual restriction that | ||
| * there may not be position arguments after named arguments applies. */ | ||
| HashTable *named_params; | ||
| uint32_t consume_params; |
There was a problem hiding this comment.
Yeah agreed, the current name is confusing.
Zend/zend_API.h
Outdated
| * integrate APIs like call_user_func_array(). The usual restriction that | ||
| * there may not be position arguments after named arguments applies. */ | ||
| HashTable *named_params; | ||
| uint32_t consume_params; |
There was a problem hiding this comment.
Also this should be moved before the named_params field as there is already a 4 bytes gap in the struct on 64bits
008446f to
fb5617f
Compare
Thank you for the review, Gina. Great point about the tests. I’d prefer to keep this as a single PR if that works for you. I did create exactly two commits from the start, and I plan to keep them that way, so the separation is explicit and easier to review. They contain now:
If you still prefer two separate PRs, I can split them, but I hoped this structure would keep review manageable while preserving context. And if merging can be done not using squash, it will keep git history relevant, but I don't know if that's something commonly used, so let me know if I should change it or if it works for you this way. Thanks again! |
2847abb to
9171681
Compare
Thanks @arnaud-lb for the review and for flagging this. But I may be missing something in the broader call paths, so happy to adjust if you still think we should add hardening here. |
Yes, that's a nice idea as well. I think this might be possible, but I don't have too much experience. |
Not necessarily, as argument passing increases the refcount. So this is valid: ZVAL_STR(&args[0], ZSTR_INIT_LITERAL("test", 0));
ZVAL_COPY_VALUE(&args[1], &args[0]);
fci.retval = return_value;
fci.param_count = 2;
fci.params = args;
zend_call_function(&fci, &fci_cache);
zval_ptr_dtor(&args[0]);But not this: ZVAL_STR(&args[0], ZSTR_INIT_LITERAL("test", 0));
ZVAL_COPY_VALUE(&args[1], &args[0]);
fci.retval = return_value;
fci.param_count = 2;
fci.params = args;
fci.consumed_args = ZEND_FCI_CONSUMED_ARG(0) | ZEND_FCI_CONSUMED_ARG(1);
zend_call_function(&fci, &fci_cache);As a result, using |
Thanks for the example. Without I’m good with either approach:
In current use cases, I only found single-argument optimization cases. But also limiting this makes it less "complete". What do you prefer? I'm good either way. |
|
My preference would be to restrict to a single arg for now, especially if there are no use-cases for multiple ones in the code base. Approach 1 would be ok too, but I suspect that duplicate detection would incur a performance penalty. Either way I have no strong objection. |
|
I'd say let's restrict this to a single arg then until we have more use-cases. It's easy enough to extend otherwise. |
I had planned to apply a desugaring for the other |
Zend/zend_execute_API.c
Outdated
| if (EXPECTED(fci->consumed_args == 0)) { | ||
| ZVAL_COPY(param, arg); | ||
| } else { | ||
| if (ZEND_FCI_IS_CONSUMED_ARG(fci->consumed_args, i) && !Z_ISREF_P(arg) && arg == &fci->params[i]) { | ||
| ZVAL_COPY_VALUE(param, arg); | ||
| ZVAL_UNDEF(arg); | ||
| } else { | ||
| ZVAL_COPY(param, arg); | ||
| } | ||
| } |
There was a problem hiding this comment.
Is having the extra branch valuable, or can this just be:
| if (EXPECTED(fci->consumed_args == 0)) { | |
| ZVAL_COPY(param, arg); | |
| } else { | |
| if (ZEND_FCI_IS_CONSUMED_ARG(fci->consumed_args, i) && !Z_ISREF_P(arg) && arg == &fci->params[i]) { | |
| ZVAL_COPY_VALUE(param, arg); | |
| ZVAL_UNDEF(arg); | |
| } else { | |
| ZVAL_COPY(param, arg); | |
| } | |
| } | |
| if (EXPECTED(!(ZEND_FCI_IS_CONSUMED_ARG(fci->consumed_args, i) && Z_ISREF_P(arg) && arg == &fci->params[i]))) { | |
| ZVAL_COPY(param, arg); | |
| } else { | |
| ZVAL_COPY_VALUE(param, arg); | |
| ZVAL_UNDEF(arg); | |
| } |
Using the internal negation to avoid an UNEXPECTED here.
There was a problem hiding this comment.
I changed it, and I think it should be good without the extra branch.
But I kept using UNEXPECTED, as I think it's readable enough. Is there another reason to prefer EXPECTED? Let me know what you think.
There was a problem hiding this comment.
Is there another reason to prefer
EXPECTED? Let me know what you think.
UNEXPECTED is generally a stronger indicator than just “not EXPECTED”. See also: https://github.com/php/php-src/pull/21340/changes#r2959248306
b08af29 to
02de4c1
Compare
Thanks @arnaud-lb and @iluuu1994. |
83a5382 to
9bb79ff
Compare
Zend/zend_execute_API.c
Outdated
| func, fci->param_count, object_or_called_scope); | ||
| uint32_t consumed_args = fci->consumed_args; | ||
|
|
||
| if (UNEXPECTED(consumed_args & (consumed_args - 1))) { |
There was a problem hiding this comment.
IMO this should be an assertion. It's better to warn extensions that this is incorrect usage.
Also: Probably harmless, but the Nevermind, I was mistaken.UNEXPECTED is currently wrong, because 0 will fail this check.
There was a problem hiding this comment.
Done, I added an assertion in the same if block. Hope it looks ok.
BTW, yes, I rechecked that case for 0 a couple of times myself in two or three distinct days, not being so sure about it. 😄
There was a problem hiding this comment.
Please use:
ZEND_ASSERT((consumed_args & (consumed_args - 1)) == 0);This branch will get eliminated anyway.
There was a problem hiding this comment.
I see, re-reading your message, I understand now that you mean to not try and sanitize consumed_args to 0, but just to add the assertion. I pushed it again now.
| <?php | ||
|
|
||
| $result = zend_test_call_with_consumed_args( | ||
| static fn($a, $b) => [$a, $b], |
There was a problem hiding this comment.
It might be useful to dump the refcount with debug_zval_dump(), given that's what we're actually trying to achieve here. In this case, [1, 2, 3] is a literal. It might also be useful to make this a normal heap-allocated value.
There was a problem hiding this comment.
Yes, it makes sense; I did the change and refcount is 3 in this test. And if sending consumed_args as 0, it is 4.
There was a problem hiding this comment.
Right, I think you should use range(1, 3) or some other function that creates heap allocated values. The refcount should be 2 (one for debug_zval_dump()s frame, and the other for the closure), which would show the value is properly moved.
There was a problem hiding this comment.
Tried it now, and I do still observe refcount(3), not 2, because in this helper path there is still one owner from the args array used to build fci.params, in addition to the closure arg and debug_zval_dump()'s temporary ref.
There was a problem hiding this comment.
Simple test case:
--TEST--
Test
--FILE--
<?php
$result = array_reduce([1, 2, 3], function ($acc, $val) {
debug_zval_dump($acc);
$acc[] = $val;
return $acc;
}, []);
debug_zval_dump($result);
?>
--EXPECT--
array(0) interned {
}
array(1) packed refcount(2){
[0]=>
int(1)
}
array(2) packed refcount(2){
[0]=>
int(1)
[1]=>
int(2)
}
array(3) packed refcount(2){
[0]=>
int(1)
[1]=>
int(2)
[2]=>
int(3)
}There was a problem hiding this comment.
Great point, I added the test to the second commit, that adds consumed_args usage for array_reduce().
ba43bb5 to
23853ee
Compare
There was a problem hiding this comment.
I see this one does fail. [1, 2, 3] is immutable in shared memory with opcache. I'd probably just drop this test, as there's ext/standard/tests/array/array_reduce_accumulator_refcount.phpt now.
There was a problem hiding this comment.
Yes, I noticed that, and I changed it already to range(1, 3).
23853ee to
b54002d
Compare
Zend/zend_execute_API.c
Outdated
|
|
||
| if (EXPECTED(!must_wrap)) { | ||
| ZVAL_COPY(param, arg); | ||
| if (UNEXPECTED(consumed_args != 0) |
There was a problem hiding this comment.
AFAIK UNEXPECTED() is heavily penalized, and it seems more common for consumed_args == 0. I'd remove the CPU hint.
There was a problem hiding this comment.
I don't mind removing the hint, though:
https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-_005f_005fbuiltin_005fexpect
For the purposes of branch prediction optimizations, the probability that a __builtin_expect expression is true is controlled by GCC’s builtin-expect-probability parameter, which defaults to 90%.
The inverse being 10%. That seems appropriate. Though it's actually probably too high for most other of our use-cases for UNEXPECTED.
Frequently code is only reordered and not evicted to cold-sections until the code grows beyond a certain size. I haven't checked what happens in this case.
There was a problem hiding this comment.
@Girgias, can you explain why UNEXPECTED() would be "heavily penalized"?
I changed it to if (EXPECTED(consumed_args == 0), as also @TimWolla mentioned it(#21340 (comment)), but my understanding is that it ends up with the same behavior. Maybe it's easier to read now?
Can you check if it looks good now?
Zend/zend_API.h
Outdated
| #define ZEND_FCI_CONSUMED_ARG(arg_index) \ | ||
| ((arg_index) < 32 ? (1u << (arg_index)) : 0u) | ||
| #define ZEND_FCI_IS_CONSUMED_ARG(consumed_args, arg_index) \ | ||
| ((arg_index) < 32 && ((consumed_args) & (1u << (arg_index)))) |
There was a problem hiding this comment.
Can we avoid introducing new macros and just make these static inline functions? Particularly since this is a macro that uses parameters more than once which causes issues if the passed value is not a literal.
There was a problem hiding this comment.
Yes, that makes sense to me; I'm not a big fan of that many macros. I changed it now.
But since moving exactly this one to macro was suggested by @arnaud-lb (#21340 (comment)), gently pinging him in case he has any comments.
b54002d to
7eedaf8
Compare
7eedaf8 to
24ca52b
Compare
Related to #8283
For some callback operations, it is safe to consume/reuse a value passed as a parameter instead of copying it first.
This PR adds a generic internal mechanism for that (through
zend_fcall_info.consumed_args) and applies it toarray_reduce()carry argument, and a few other callback-using functions.This reduces copy-on-write churn, having the parameter value the only reference, and significantly improves array_reduce() performance for mutable accumulator patterns, while keeping userland semantics unchanged.
The same mechanism can be used in other callback-using functions, and it is also applied here to
preg_replace_callback()/preg_replace_callback_array()andob_start().