-
Notifications
You must be signed in to change notification settings - Fork 68
Description
It is encouraged to write loops using svptest_first(svptrue(), pred = svwhilelt())[1], instead of manually comparing i < count as the loop termination condition, as the predicate-generating svwhilelt already sets appropriate registers.
This design unfortunately has two dependencies that are partly broken:
-
A
PTESTexists for any predicate type - there isn't forsvcount_t, the predicate-as-counter type.
Thus there is currently no way to stuff the result ofsvwhilelt_c32()into asvptest_first(), in order to extend this pattern to the streaming SVE/SME side. -
The compiler knows to elide
ptest(ptrue, flaggen)intotake flags from flaggen(), which doesn't hold true forgcc-15[2]. Norclangwhenptrue's type mismatch withwhilelt[3]
Thus I propose a dedicated set of unitary predicate -> bool intrinsic functions svptest_first svptest_last svptest_any, or maybe call them svbif_first, etc. for use here.
This avoids introducing the family of hypothetical bool svptest_first(svcount_t, svcount_t), and overall makes the break-loop line more readable, and less prone to error.
[1] for-loop
void run(uint32_t *input, size_t count) {
svbool_t active;
// Want to write:
for(size_t i = 0; i < count; i += svcntw()) {
// Generate predicate
active = svwhilelt_b32_u64(i, count);
// Load
svuint32_t vec = svld1(active, &input[i]);
// Do with the vector
}
// Write this instead:
for(size_t i = 0;
// Generate active elements and decide whether to move on in one go
svptest_first(svptrue_b32(),active = svwhilelt_b32_u64(i, count));
i += svcntw()) {
// Load
svuint32_t vec = svld1(active, &input[i]);
// do with the vector
}
}which (should) nicely assemble to:
whilelo p0.s, xzr, x1
b.pl RET ; count == 0, we are already done
mov x8, #0 ; i=0
LOOPSTART:
ld1w { z1.s }, p0/z, [x0, x8, lsl #2]
; // do with the vector
incw x8 ; i += vl
whilelo p0.s, x8, x1 ; compare i < count, sets predicate, sets NZCV flags...
b.mi LOOPSTART ; for use with this branch
RET:
ret[2] gcc
whilelo p14.s, xzr, x1
ptrue p15.s, all ; literally svptrue_u32()
mov p7.b, p14.b ; p7 is svbool_t active
ptest p15, p14.b ; literally svptest( , )
b.nfrst RET ; count == 0, early exit[3] clang, but type mismatch between ptrue and whilelo.
svptest_first(svptrue_b8(),active = svwhilelt_b32_u64(i, count));
^^ ^^^ ptrue p0.b ; literally svptrue_u8()
whilelo p1.s, xsr, x1
ptest p0, p1.b ; then literally svptest( , )
b.pl RET ; count == 0, early exit[4] force stuff svcount into svbool. crashes clang, gcc compiles fine, but the semantics is wrong anyways
svcount_t active;
for (uint_t i;
svptest_first(svreinterpret_b(svptrue_c32()),svreinterpret_b(active = svwhilelt_c32_u64(i, count, 4));
i += svcntsw()) {
// ^^^^^^^ What's the point of having a dedicated svcntsw() when svcntw() returns streaming VL in streaming mode, anyway?
// ...
}