Conversation
|
|
||
| # RUN: split-file %s %t | ||
| # RUN: %dxc_target -T cs_6_6 -Fo %t.o %t/source.hlsl | ||
| # RUN: %offloader %t/pipeline.yaml %t.o No newline at end of file |
There was a problem hiding this comment.
nit: the file endings look weird
There was a problem hiding this comment.
Should we also add tests for non-int types? Is an aggregate type interesting here?
There was a problem hiding this comment.
Thought the wave tests would be sufficient to test things work appropriately, but yes, the issue does say we should test all overload types as well.
tex3d
left a comment
There was a problem hiding this comment.
The test design seems odd, and verbose for the scenarios it tests. It seems like you could test more interesting scenarios even outside the explicit WaveSize based tests.
| { | ||
| unsigned int index = 0; | ||
| bool Result = WaveActiveAllEqual(In[TID.x]); | ||
| InterlockedAdd(Out[index++], (unsigned int)Result); |
There was a problem hiding this comment.
I feel like we shouldn't depend on InterlockedAdd in a unit test for WaveActiveAllEqual. And we do need to test that that the false case runs correctly.
Couldn't we do something like:
bool Result0 = WaveActiveAllEqual(In0[TID.x]);
if (TID.x == 0) {
Out[0] = Result ? 10 : 5;
}
bool Result1 = WaveActiveAllEqual(In1[TID.x]);
if (TID.x == 1) {
Out[1] = Result ? 10 : 5;
}
So then we should expect 10 if true and 5 if false. Then fill the output buffer with 0 at the start to ensure that the values are being overriden as expected.
There was a problem hiding this comment.
I hear you. My reasoning is:
- The .Wave tests test the builtin when it returns false. Its purpose is to, rather than test a variety of data types, test different return values and more of a variety of flow control conditions.
- Your alternative looks like it's only testing one lane per input resource. I would like to verify that all lanes return true in these tests. And with this structure, if I were to test all the lanes, I would need to repeat that structure 42 times, and it would be a long shader and maybe kind of ugly.
- Though depending on InterlockedAdd is not ideal, I don't see a way to avoid a data race in checking the output of all 4 threads, without blowing up the outputs to check by a factor of 4. My hunch is that a driver implementation of InterlockedAdd is not that complicated, and we should be able to have separate tests for it that verify it works for all targets. Assuming our InterlockedAdd tests will be complete in coverage, and considering the simple use case, I don't think it will get in the way of testing the builtin.
There was a problem hiding this comment.
- Ah sorry, missed the .wave when reviewing on just the most recent commits. Makes sense.
2 & 3. I guess I would be of the opinion that checking an individual lane in each invocation would be sufficient to avoid them, but I won't block on this.
| { | ||
| unsigned int index = 0; | ||
| bool Result = WaveActiveAllEqual(In[TID.x]); | ||
| InterlockedAdd(Out[index++], (unsigned int)Result); |
There was a problem hiding this comment.
- Ah sorry, missed the .wave when reviewing on just the most recent commits. Makes sense.
2 & 3. I guess I would be of the opinion that checking an individual lane in each invocation would be sufficient to avoid them, but I won't block on this.
tex3d
left a comment
There was a problem hiding this comment.
I'm not a fan on the use of InterlockedAdd. I'm even less of a fan considering: it's not implemented in Clang yet, you're blocked on Intel and Vulkan due to some bug there.
Alternatives include:
- Using another Wave intrinsic, like WaveActiveCountBits, to gather values across the lanes, then do a normal write to the buffer on the first lane.
- Writing out from each lane.
You could also reduce the output by summing the components of vector results. This could keep output for the write-each-lane option small. It also makes it easier to see the separate vector sizes in the output, since the expected value will be scaled by vector size.
| // Expect all trues, all elements will be the same. | ||
| // Output buffers start off as true, and each | ||
| // lane writes its result value anded with the existing | ||
| // result in the output buffer. | ||
| // Since we expect all results to be true, output buffers | ||
| // should remain all true at the end. |
There was a problem hiding this comment.
This comment is no longer correct.
This PR adds tests for WaveActiveAllEqual.
It does not test matrices.
It tests ints and int2s for wavesizes 32, 64, and 128.
Fixes #915