Skip to content

[proposal] Expose conditional branch upon NZC flags, with *unitary* ptest_ family functions, especially a svcount_t -> bool version #405

@Jerry1144

Description

@Jerry1144

It is encouraged to write loops using svptest_first(svptrue(), pred = svwhilelt())[1], instead of manually comparing i < count as the loop termination condition, as the predicate-generating svwhilelt already sets appropriate registers.

This design unfortunately has two dependencies that are partly broken:

  1. A PTEST exists for any predicate type - there isn't for svcount_t, the predicate-as-counter type.
    Thus there is currently no way to stuff the result of svwhilelt_c32() into a svptest_first(), in order to extend this pattern to the streaming SVE/SME side.

  2. The compiler knows to elide ptest(ptrue, flaggen) into take flags from flaggen(), which doesn't hold true for gcc-15[2]. Nor clang when ptrue's type mismatch with whilelt[3]

Thus I propose a dedicated set of unitary predicate -> bool intrinsic functions svptest_first svptest_last svptest_any, or maybe call them svbif_first, etc. for use here.

This avoids introducing the family of hypothetical bool svptest_first(svcount_t, svcount_t), and overall makes the break-loop line more readable, and less prone to error.


[1] for-loop

void run(uint32_t *input, size_t count) {
svbool_t active;

// Want to write:
for(size_t i = 0; i < count; i += svcntw()) {

    // Generate predicate
    active = svwhilelt_b32_u64(i, count);
    // Load
    svuint32_t vec = svld1(active, &input[i]);
    // Do with the vector
}

// Write this instead:
for(size_t i = 0;
    // Generate active elements and decide whether to move on in one go
    svptest_first(svptrue_b32(),active = svwhilelt_b32_u64(i, count));
    i += svcntw()) {

    // Load
    svuint32_t vec = svld1(active, &input[i]);
    // do with the vector
}
}

which (should) nicely assemble to:

	whilelo	p0.s, xzr, x1
	b.pl	RET                             ; count == 0, we are already done
	mov	x8, #0                          ; i=0
LOOPSTART: 
	ld1w	{ z1.s }, p0/z, [x0, x8, lsl #2]
; // do with the vector
	incw	x8                              ; i += vl
	whilelo	p0.s, x8, x1                    ; compare i < count, sets predicate, sets NZCV flags...
	b.mi	LOOPSTART                       ; for use with this branch
RET:
	ret

[2] gcc

	whilelo	p14.s, xzr, x1
	ptrue	p15.s, all               ; literally svptrue_u32()
	mov	p7.b, p14.b              ; p7 is svbool_t active
	ptest	p15, p14.b               ; literally svptest( , )
	b.nfrst	RET              ; count == 0, early exit

[3] clang, but type mismatch between ptrue and whilelo.

svptest_first(svptrue_b8(),active = svwhilelt_b32_u64(i, count));
                      ^^                      ^^^
	ptrue	p0.b                     ; literally svptrue_u8()
	whilelo	p1.s, xsr, x1
	ptest	p0, p1.b                 ; then literally svptest( , )
	b.pl	RET              ; count == 0, early exit

[4] force stuff svcount into svbool. crashes clang, gcc compiles fine, but the semantics is wrong anyways

svcount_t active;
for (uint_t i;
     svptest_first(svreinterpret_b(svptrue_c32()),svreinterpret_b(active = svwhilelt_c32_u64(i, count, 4));
     i += svcntsw()) {
     //   ^^^^^^^  What's the point of having a dedicated svcntsw() when svcntw() returns streaming VL in streaming mode, anyway?
  // ...
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions