-
Notifications
You must be signed in to change notification settings - Fork 15.5k
[LSR] Reverse order in NarrowSearchSpaceByCollapsingUnrolledCode #172314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
The order in which NarrowSearchSpaceByCollapsingUnrolledCode iterates through the Uses array determines which LSRUses get deleted, with earlier uses being deleted and collapsed into later ones. The Uses array is generated from IVUsers which places later uses earlier in the array. Currently we iterate forward through the array, so the later uses are deleted and we end up with earlier uses. However we also delete elements by swapping with the last element which changes the order, meaning we can end up with a use in the middle of the loop being the final one. This is bad if we end up with a postincrement solution, as the value before postincrement will still be used later so we needs to be kept live in a register. Fix this by iterating backwards through the Uses array, which means that the last use will be the one that is kept, and we don't have the order changing as uses get deleted.
|
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-backend-arm Author: John Brawn (john-brawn-arm) ChangesThe order in which NarrowSearchSpaceByCollapsingUnrolledCode iterates through the Uses array determines which LSRUses get deleted, with earlier uses being deleted and collapsed into later ones. The Uses array is generated from IVUsers which places later uses earlier in the array. Currently we iterate forward through the array, so the later uses are deleted and we end up with earlier uses. However we also delete elements by swapping with the last element which changes the order, meaning we can end up with a use in the middle of the loop being the final one. This is bad if we end up with a postincrement solution, as the value before postincrement will still be used later so we needs to be kept live in a register. Fix this by iterating backwards through the Uses array, which means that the last use will be the one that is kept, and we don't have the order changing as uses get deleted. Patch is 60.34 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/172314.diff 14 Files Affected:
diff --git a/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp b/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
index e12caa2136962..2b0e98c2fcfd8 100644
--- a/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
@@ -4957,7 +4957,7 @@ void LSRInstance::NarrowSearchSpaceByCollapsingUnrolledCode() {
// This is especially useful for unrolled loops.
- for (size_t LUIdx = 0, NumUses = Uses.size(); LUIdx != NumUses; ++LUIdx) {
+ for (ssize_t LUIdx = Uses.size()-1; LUIdx >= 0; --LUIdx) {
LSRUse &LU = Uses[LUIdx];
for (const Formula &F : LU.Formulae) {
if (F.BaseOffset.isZero() || (F.Scale != 0 && F.Scale != 1))
@@ -5002,8 +5002,6 @@ void LSRInstance::NarrowSearchSpaceByCollapsingUnrolledCode() {
// Delete the old use.
DeleteUse(LU, LUIdx);
- --LUIdx;
- --NumUses;
break;
}
}
diff --git a/llvm/test/CodeGen/ARM/loop-indexing.ll b/llvm/test/CodeGen/ARM/loop-indexing.ll
index bb859b202bbc0..62fafc53e5e86 100644
--- a/llvm/test/CodeGen/ARM/loop-indexing.ll
+++ b/llvm/test/CodeGen/ARM/loop-indexing.ll
@@ -68,12 +68,11 @@ exit:
}
; CHECK-LABEL: convolve_16bit
-; TODO: Both arrays should use indexing
; CHECK-DEFAULT: ldr{{.*}}, #8]!
-; CHECK-DEFAULT-NOT: ldr{{.*}}]!
+; CHECK-DEFAULT: ldr{{.*}}, #8]!
; CHECK-COMPLEX: ldr{{.*}}, #8]!
-; CHECK-COMPLEX-NOT: ldr{{.*}}]!
+; CHECK-COMPLEX: ldr{{.*}}, #8]!
; DISABLED-NOT: ldr{{.*}}]!
; DISABLED-NOT: str{{.*}}]!
diff --git a/llvm/test/CodeGen/PowerPC/dform-pair-load-store.ll b/llvm/test/CodeGen/PowerPC/dform-pair-load-store.ll
index f5ae9a20a4ee0..030acb382bb5a 100644
--- a/llvm/test/CodeGen/PowerPC/dform-pair-load-store.ll
+++ b/llvm/test/CodeGen/PowerPC/dform-pair-load-store.ll
@@ -16,8 +16,8 @@ define void @foo(i32 zeroext %n, ptr %ptr, ptr %ptr2) {
; CHECK-NEXT: cmplwi r3, 0
; CHECK-NEXT: beqlr cr0
; CHECK-NEXT: # %bb.1: # %for.body.lr.ph
-; CHECK-NEXT: addi r4, r4, 64
; CHECK-NEXT: addi r5, r5, 64
+; CHECK-NEXT: addi r4, r4, 64
; CHECK-NEXT: mtctr r3
; CHECK-NEXT: .p2align 4
; CHECK-NEXT: .LBB0_2: # %for.body
@@ -41,8 +41,8 @@ define void @foo(i32 zeroext %n, ptr %ptr, ptr %ptr2) {
; CHECK-BE-NEXT: cmplwi r3, 0
; CHECK-BE-NEXT: beqlr cr0
; CHECK-BE-NEXT: # %bb.1: # %for.body.lr.ph
-; CHECK-BE-NEXT: addi r4, r4, 64
; CHECK-BE-NEXT: addi r5, r5, 64
+; CHECK-BE-NEXT: addi r4, r4, 64
; CHECK-BE-NEXT: mtctr r3
; CHECK-BE-NEXT: .p2align 4
; CHECK-BE-NEXT: .LBB0_2: # %for.body
diff --git a/llvm/test/CodeGen/PowerPC/lsr-profitable-chain.ll b/llvm/test/CodeGen/PowerPC/lsr-profitable-chain.ll
index 79f2ef3e3746a..7508ac12e9b46 100644
--- a/llvm/test/CodeGen/PowerPC/lsr-profitable-chain.ll
+++ b/llvm/test/CodeGen/PowerPC/lsr-profitable-chain.ll
@@ -8,31 +8,31 @@ define void @foo(ptr readonly %0, ptr %1, i64 %2, i64 %3, i64 %4, i64 %5, i64 %6
; CHECK-NEXT: cmpd 5, 7
; CHECK-NEXT: bgelr 0
; CHECK-NEXT: # %bb.1: # %.preheader
+; CHECK-NEXT: addi 12, 5, 3
; CHECK-NEXT: std 27, -40(1) # 8-byte Folded Spill
; CHECK-NEXT: addi 27, 5, 2
+; CHECK-NEXT: std 29, -24(1) # 8-byte Folded Spill
+; CHECK-NEXT: addi 29, 5, 1
+; CHECK-NEXT: addi 11, 3, 16
; CHECK-NEXT: std 28, -32(1) # 8-byte Folded Spill
-; CHECK-NEXT: addi 28, 5, 3
-; CHECK-NEXT: std 30, -16(1) # 8-byte Folded Spill
-; CHECK-NEXT: addi 30, 5, 1
-; CHECK-NEXT: mulld 12, 8, 5
; CHECK-NEXT: mulld 0, 9, 8
-; CHECK-NEXT: std 29, -24(1) # 8-byte Folded Spill
-; CHECK-NEXT: addi 29, 3, 16
-; CHECK-NEXT: sldi 11, 10, 3
+; CHECK-NEXT: mulld 28, 8, 5
+; CHECK-NEXT: std 30, -16(1) # 8-byte Folded Spill
+; CHECK-NEXT: sldi 30, 10, 3
; CHECK-NEXT: std 22, -80(1) # 8-byte Folded Spill
; CHECK-NEXT: std 23, -72(1) # 8-byte Folded Spill
; CHECK-NEXT: std 24, -64(1) # 8-byte Folded Spill
; CHECK-NEXT: std 25, -56(1) # 8-byte Folded Spill
; CHECK-NEXT: std 26, -48(1) # 8-byte Folded Spill
-; CHECK-NEXT: mulld 30, 8, 30
-; CHECK-NEXT: mulld 28, 8, 28
+; CHECK-NEXT: mulld 12, 8, 12
+; CHECK-NEXT: mulld 29, 8, 29
; CHECK-NEXT: mulld 8, 8, 27
; CHECK-NEXT: b .LBB0_3
; CHECK-NEXT: .p2align 4
; CHECK-NEXT: .LBB0_2:
; CHECK-NEXT: add 5, 5, 9
; CHECK-NEXT: add 12, 12, 0
-; CHECK-NEXT: add 30, 30, 0
+; CHECK-NEXT: add 29, 29, 0
; CHECK-NEXT: add 28, 28, 0
; CHECK-NEXT: add 8, 8, 0
; CHECK-NEXT: cmpd 5, 7
@@ -43,24 +43,24 @@ define void @foo(ptr readonly %0, ptr %1, i64 %2, i64 %3, i64 %4, i64 %5, i64 %6
; CHECK-NEXT: cmpd 6, 27
; CHECK-NEXT: bge 0, .LBB0_2
; CHECK-NEXT: # %bb.4:
-; CHECK-NEXT: add 25, 6, 12
+; CHECK-NEXT: add 24, 6, 28
+; CHECK-NEXT: add 26, 6, 12
+; CHECK-NEXT: add 25, 6, 29
+; CHECK-NEXT: sldi 23, 24, 3
; CHECK-NEXT: add 24, 6, 8
-; CHECK-NEXT: sldi 26, 6, 3
-; CHECK-NEXT: sldi 23, 25, 3
-; CHECK-NEXT: add 25, 6, 30
-; CHECK-NEXT: sldi 24, 24, 3
-; CHECK-NEXT: add 26, 4, 26
+; CHECK-NEXT: sldi 26, 26, 3
; CHECK-NEXT: sldi 22, 25, 3
-; CHECK-NEXT: add 25, 6, 28
-; CHECK-NEXT: add 24, 29, 24
+; CHECK-NEXT: sldi 25, 6, 3
+; CHECK-NEXT: sldi 24, 24, 3
+; CHECK-NEXT: add 26, 11, 26
+; CHECK-NEXT: add 25, 4, 25
; CHECK-NEXT: add 23, 3, 23
-; CHECK-NEXT: sldi 25, 25, 3
; CHECK-NEXT: add 22, 3, 22
-; CHECK-NEXT: add 25, 29, 25
+; CHECK-NEXT: add 24, 11, 24
; CHECK-NEXT: .p2align 5
; CHECK-NEXT: .LBB0_5: # Parent Loop BB0_3 Depth=1
; CHECK-NEXT: # => This Inner Loop Header: Depth=2
-; CHECK-NEXT: lfd 0, 0(26)
+; CHECK-NEXT: lfd 0, 0(25)
; CHECK-NEXT: lfd 1, 0(23)
; CHECK-NEXT: add 6, 6, 10
; CHECK-NEXT: cmpd 6, 27
@@ -70,7 +70,7 @@ define void @foo(ptr readonly %0, ptr %1, i64 %2, i64 %3, i64 %4, i64 %5, i64 %6
; CHECK-NEXT: lfd 1, 16(23)
; CHECK-NEXT: xsadddp 0, 0, 1
; CHECK-NEXT: lfd 1, 24(23)
-; CHECK-NEXT: add 23, 23, 11
+; CHECK-NEXT: add 23, 23, 30
; CHECK-NEXT: xsadddp 0, 0, 1
; CHECK-NEXT: lfd 1, 0(22)
; CHECK-NEXT: xsadddp 0, 0, 1
@@ -79,7 +79,7 @@ define void @foo(ptr readonly %0, ptr %1, i64 %2, i64 %3, i64 %4, i64 %5, i64 %6
; CHECK-NEXT: lfd 1, 16(22)
; CHECK-NEXT: xsadddp 0, 0, 1
; CHECK-NEXT: lfd 1, 24(22)
-; CHECK-NEXT: add 22, 22, 11
+; CHECK-NEXT: add 22, 22, 30
; CHECK-NEXT: xsadddp 0, 0, 1
; CHECK-NEXT: lfd 1, -16(24)
; CHECK-NEXT: xsadddp 0, 0, 1
@@ -88,19 +88,19 @@ define void @foo(ptr readonly %0, ptr %1, i64 %2, i64 %3, i64 %4, i64 %5, i64 %6
; CHECK-NEXT: lfd 1, 0(24)
; CHECK-NEXT: xsadddp 0, 0, 1
; CHECK-NEXT: lfd 1, 8(24)
-; CHECK-NEXT: add 24, 24, 11
+; CHECK-NEXT: add 24, 24, 30
; CHECK-NEXT: xsadddp 0, 0, 1
-; CHECK-NEXT: lfd 1, -16(25)
+; CHECK-NEXT: lfd 1, -16(26)
; CHECK-NEXT: xsadddp 0, 0, 1
-; CHECK-NEXT: lfd 1, -8(25)
+; CHECK-NEXT: lfd 1, -8(26)
; CHECK-NEXT: xsadddp 0, 0, 1
-; CHECK-NEXT: lfd 1, 0(25)
+; CHECK-NEXT: lfd 1, 0(26)
; CHECK-NEXT: xsadddp 0, 0, 1
-; CHECK-NEXT: lfd 1, 8(25)
-; CHECK-NEXT: add 25, 25, 11
+; CHECK-NEXT: lfd 1, 8(26)
+; CHECK-NEXT: add 26, 26, 30
; CHECK-NEXT: xsadddp 0, 0, 1
-; CHECK-NEXT: stfd 0, 0(26)
-; CHECK-NEXT: add 26, 26, 11
+; CHECK-NEXT: stfd 0, 0(25)
+; CHECK-NEXT: add 25, 25, 30
; CHECK-NEXT: blt 0, .LBB0_5
; CHECK-NEXT: b .LBB0_2
; CHECK-NEXT: .LBB0_6:
diff --git a/llvm/test/CodeGen/PowerPC/more-dq-form-prepare.ll b/llvm/test/CodeGen/PowerPC/more-dq-form-prepare.ll
index af0942e99182d..5c7eb283aa6f0 100644
--- a/llvm/test/CodeGen/PowerPC/more-dq-form-prepare.ll
+++ b/llvm/test/CodeGen/PowerPC/more-dq-form-prepare.ll
@@ -18,8 +18,8 @@ define void @foo(ptr %.m, ptr %.n, ptr %.a, ptr %.x, ptr %.l, ptr %.vy01, ptr %.
; CHECK-NEXT: cmpwi 3, 1
; CHECK-NEXT: bltlr 0
; CHECK-NEXT: # %bb.2: # %_loop_1_do_.preheader
-; CHECK-NEXT: stdu 1, -592(1)
-; CHECK-NEXT: .cfi_def_cfa_offset 592
+; CHECK-NEXT: stdu 1, -608(1)
+; CHECK-NEXT: .cfi_def_cfa_offset 608
; CHECK-NEXT: .cfi_offset r14, -192
; CHECK-NEXT: .cfi_offset r15, -184
; CHECK-NEXT: .cfi_offset r16, -176
@@ -56,300 +56,293 @@ define void @foo(ptr %.m, ptr %.n, ptr %.a, ptr %.x, ptr %.l, ptr %.vy01, ptr %.
; CHECK-NEXT: .cfi_offset v29, -240
; CHECK-NEXT: .cfi_offset v30, -224
; CHECK-NEXT: .cfi_offset v31, -208
-; CHECK-NEXT: std 14, 400(1) # 8-byte Folded Spill
-; CHECK-NEXT: std 15, 408(1) # 8-byte Folded Spill
-; CHECK-NEXT: ld 2, 728(1)
-; CHECK-NEXT: ld 14, 688(1)
-; CHECK-NEXT: ld 11, 704(1)
-; CHECK-NEXT: std 20, 448(1) # 8-byte Folded Spill
-; CHECK-NEXT: std 21, 456(1) # 8-byte Folded Spill
-; CHECK-NEXT: mr 21, 5
-; CHECK-NEXT: lwa 5, 0(7)
-; CHECK-NEXT: ld 7, 720(1)
-; CHECK-NEXT: std 22, 464(1) # 8-byte Folded Spill
-; CHECK-NEXT: std 23, 472(1) # 8-byte Folded Spill
-; CHECK-NEXT: mr 22, 6
-; CHECK-NEXT: ld 6, 848(1)
+; CHECK-NEXT: std 28, 528(1) # 8-byte Folded Spill
+; CHECK-NEXT: std 29, 536(1) # 8-byte Folded Spill
+; CHECK-NEXT: mr 28, 5
+; CHECK-NEXT: ld 5, 864(1)
; CHECK-NEXT: addi 3, 3, 1
-; CHECK-NEXT: ld 15, 736(1)
-; CHECK-NEXT: std 18, 432(1) # 8-byte Folded Spill
-; CHECK-NEXT: std 19, 440(1) # 8-byte Folded Spill
-; CHECK-NEXT: ld 19, 768(1)
-; CHECK-NEXT: ld 18, 760(1)
-; CHECK-NEXT: std 30, 528(1) # 8-byte Folded Spill
-; CHECK-NEXT: std 31, 536(1) # 8-byte Folded Spill
-; CHECK-NEXT: ld 12, 696(1)
-; CHECK-NEXT: lxv 0, 0(9)
-; CHECK-NEXT: std 9, 64(1) # 8-byte Folded Spill
-; CHECK-NEXT: std 10, 72(1) # 8-byte Folded Spill
-; CHECK-NEXT: lxv 1, 0(8)
+; CHECK-NEXT: ld 2, 848(1)
+; CHECK-NEXT: ld 12, 784(1)
+; CHECK-NEXT: std 22, 480(1) # 8-byte Folded Spill
+; CHECK-NEXT: std 23, 488(1) # 8-byte Folded Spill
+; CHECK-NEXT: mr 22, 6
+; CHECK-NEXT: li 6, 9
+; CHECK-NEXT: ld 23, 800(1)
+; CHECK-NEXT: ld 29, 712(1)
+; CHECK-NEXT: std 24, 496(1) # 8-byte Folded Spill
+; CHECK-NEXT: std 25, 504(1) # 8-byte Folded Spill
+; CHECK-NEXT: ld 25, 816(1)
; CHECK-NEXT: cmpldi 3, 9
-; CHECK-NEXT: ld 30, 824(1)
-; CHECK-NEXT: std 28, 512(1) # 8-byte Folded Spill
-; CHECK-NEXT: std 29, 520(1) # 8-byte Folded Spill
-; CHECK-NEXT: ld 29, 840(1)
-; CHECK-NEXT: ld 28, 832(1)
-; CHECK-NEXT: std 16, 416(1) # 8-byte Folded Spill
-; CHECK-NEXT: std 17, 424(1) # 8-byte Folded Spill
-; CHECK-NEXT: ld 23, 784(1)
-; CHECK-NEXT: ld 20, 776(1)
-; CHECK-NEXT: std 24, 480(1) # 8-byte Folded Spill
-; CHECK-NEXT: std 25, 488(1) # 8-byte Folded Spill
-; CHECK-NEXT: ld 25, 800(1)
-; CHECK-NEXT: ld 24, 792(1)
-; CHECK-NEXT: std 26, 496(1) # 8-byte Folded Spill
-; CHECK-NEXT: std 27, 504(1) # 8-byte Folded Spill
-; CHECK-NEXT: ld 27, 816(1)
-; CHECK-NEXT: ld 26, 808(1)
-; CHECK-NEXT: stfd 26, 544(1) # 8-byte Folded Spill
-; CHECK-NEXT: stfd 27, 552(1) # 8-byte Folded Spill
-; CHECK-NEXT: ld 17, 752(1)
-; CHECK-NEXT: extswsli 9, 5, 3
-; CHECK-NEXT: lxv 4, 0(14)
-; CHECK-NEXT: std 14, 32(1) # 8-byte Folded Spill
-; CHECK-NEXT: std 12, 40(1) # 8-byte Folded Spill
-; CHECK-NEXT: mulli 0, 5, 40
-; CHECK-NEXT: sldi 14, 5, 5
-; CHECK-NEXT: mulli 31, 5, 24
-; CHECK-NEXT: lxv 38, 0(2)
-; CHECK-NEXT: lxv 2, 0(11)
-; CHECK-NEXT: std 2, 80(1) # 8-byte Folded Spill
-; CHECK-NEXT: std 15, 88(1) # 8-byte Folded Spill
-; CHECK-NEXT: mulli 2, 5, 48
-; CHECK-NEXT: sldi 5, 5, 4
-; CHECK-NEXT: ld 16, 744(1)
-; CHECK-NEXT: lxv 5, 0(10)
-; CHECK-NEXT: std 6, 200(1) # 8-byte Folded Spill
-; CHECK-NEXT: std 29, 192(1) # 8-byte Folded Spill
-; CHECK-NEXT: ld 6, 712(1)
-; CHECK-NEXT: mr 10, 7
-; CHECK-NEXT: add 7, 14, 21
-; CHECK-NEXT: lxv 13, 0(19)
-; CHECK-NEXT: std 8, 48(1) # 8-byte Folded Spill
-; CHECK-NEXT: std 6, 56(1) # 8-byte Folded Spill
-; CHECK-NEXT: mr 8, 11
-; CHECK-NEXT: li 11, 9
-; CHECK-NEXT: iselgt 3, 3, 11
+; CHECK-NEXT: ld 24, 808(1)
+; CHECK-NEXT: std 26, 512(1) # 8-byte Folded Spill
+; CHECK-NEXT: std 27, 520(1) # 8-byte Folded Spill
+; CHECK-NEXT: ld 26, 824(1)
+; CHECK-NEXT: ld 27, 832(1)
+; CHECK-NEXT: std 14, 416(1) # 8-byte Folded Spill
+; CHECK-NEXT: std 15, 424(1) # 8-byte Folded Spill
+; CHECK-NEXT: ld 15, 728(1)
+; CHECK-NEXT: ld 14, 720(1)
+; CHECK-NEXT: std 16, 432(1) # 8-byte Folded Spill
+; CHECK-NEXT: std 17, 440(1) # 8-byte Folded Spill
+; CHECK-NEXT: ld 17, 744(1)
+; CHECK-NEXT: ld 16, 736(1)
+; CHECK-NEXT: std 18, 448(1) # 8-byte Folded Spill
+; CHECK-NEXT: std 19, 456(1) # 8-byte Folded Spill
+; CHECK-NEXT: iselgt 3, 3, 6
+; CHECK-NEXT: ld 19, 760(1)
+; CHECK-NEXT: ld 18, 752(1)
+; CHECK-NEXT: std 20, 464(1) # 8-byte Folded Spill
+; CHECK-NEXT: std 21, 472(1) # 8-byte Folded Spill
+; CHECK-NEXT: ld 21, 776(1)
+; CHECK-NEXT: ld 20, 768(1)
+; CHECK-NEXT: std 30, 544(1) # 8-byte Folded Spill
+; CHECK-NEXT: std 31, 552(1) # 8-byte Folded Spill
+; CHECK-NEXT: ld 30, 840(1)
+; CHECK-NEXT: ld 31, 792(1)
+; CHECK-NEXT: std 8, 40(1) # 8-byte Folded Spill
+; CHECK-NEXT: std 9, 48(1) # 8-byte Folded Spill
+; CHECK-NEXT: ld 11, 704(1)
+; CHECK-NEXT: lxv 39, 0(8)
+; CHECK-NEXT: stfd 26, 560(1) # 8-byte Folded Spill
+; CHECK-NEXT: stfd 27, 568(1) # 8-byte Folded Spill
; CHECK-NEXT: addi 3, 3, -2
-; CHECK-NEXT: rldicl 11, 3, 61, 3
-; CHECK-NEXT: lxv 3, 0(12)
-; CHECK-NEXT: lxv 40, 0(6)
-; CHECK-NEXT: std 18, 112(1) # 8-byte Folded Spill
-; CHECK-NEXT: std 19, 120(1) # 8-byte Folded Spill
-; CHECK-NEXT: add 19, 21, 5
-; CHECK-NEXT: ld 5, 200(1) # 8-byte Folded Reload
-; CHECK-NEXT: lxv 39, 0(10)
-; CHECK-NEXT: addi 3, 7, 32
-; CHECK-NEXT: add 12, 31, 21
-; CHECK-NEXT: std 20, 128(1) # 8-byte Folded Spill
-; CHECK-NEXT: std 23, 136(1) # 8-byte Folded Spill
-; CHECK-NEXT: lxv 33, 0(15)
-; CHECK-NEXT: lxv 32, 0(16)
-; CHECK-NEXT: std 26, 160(1) # 8-byte Folded Spill
-; CHECK-NEXT: std 27, 168(1) # 8-byte Folded Spill
-; CHECK-NEXT: lxv 37, 0(17)
-; CHECK-NEXT: lxv 36, 0(18)
-; CHECK-NEXT: std 30, 176(1) # 8-byte Folded Spill
-; CHECK-NEXT: std 28, 184(1) # 8-byte Folded Spill
-; CHECK-NEXT: lxv 12, 0(20)
-; CHECK-NEXT: lxv 11, 0(23)
-; CHECK-NEXT: add 20, 21, 9
-; CHECK-NEXT: stfd 28, 560(1) # 8-byte Folded Spill
-; CHECK-NEXT: stfd 29, 568(1) # 8-byte Folded Spill
-; CHECK-NEXT: lxv 10, 0(24)
-; CHECK-NEXT: lxv 9, 0(25)
-; CHECK-NEXT: stfd 30, 576(1) # 8-byte Folded Spill
-; CHECK-NEXT: stfd 31, 584(1) # 8-byte Folded Spill
-; CHECK-NEXT: lxv 8, 0(26)
-; CHECK-NEXT: lxv 7, 0(27)
-; CHECK-NEXT: addi 12, 12, 32
-; CHECK-NEXT: li 27, 0
-; CHECK-NEXT: mr 26, 21
-; CHECK-NEXT: stxv 52, 208(1) # 16-byte Folded Spill
-; CHECK-NEXT: stxv 53, 224(1) # 16-byte Folded Spill
-; CHECK-NEXT: lxv 6, 0(30)
-; CHECK-NEXT: lxv 41, 0(28)
-; CHECK-NEXT: addi 7, 11, 1
-; CHECK-NEXT: add 11, 0, 21
-; CHECK-NEXT: li 28, 1
-; CHECK-NEXT: stxv 54, 240(1) # 16-byte Folded Spill
-; CHECK-NEXT: stxv 55, 256(1) # 16-byte Folded Spill
-; CHECK-NEXT: lxv 43, 0(29)
-; CHECK-NEXT: lxv 42, 0(5)
-; CHECK-NEXT: stxv 56, 272(1) # 16-byte Folded Spill
-; CHECK-NEXT: stxv 57, 288(1) # 16-byte Folded Spill
-; CHECK-NEXT: addi 11, 11, 32
-; CHECK-NEXT: stxv 58, 304(1) # 16-byte Folded Spill
-; CHECK-NEXT: stxv 59, 320(1) # 16-byte Folded Spill
-; CHECK-NEXT: stxv 60, 336(1) # 16-byte Folded Spill
-; CHECK-NEXT: stxv 61, 352(1) # 16-byte Folded Spill
-; CHECK-NEXT: stxv 62, 368(1) # 16-byte Folded Spill
-; CHECK-NEXT: stxv 63, 384(1) # 16-byte Folded Spill
-; CHECK-NEXT: std 16, 96(1) # 8-byte Folded Spill
-; CHECK-NEXT: std 17, 104(1) # 8-byte Folded Spill
-; CHECK-NEXT: std 24, 144(1) # 8-byte Folded Spill
-; CHECK-NEXT: std 25, 152(1) # 8-byte Folded Spill
+; CHECK-NEXT: lxv 4, 0(23)
+; CHECK-NEXT: lxv 1, 0(26)
+; CHECK-NEXT: std 5, 216(1) # 8-byte Folded Spill
+; CHECK-NEXT: std 23, 152(1) # 8-byte Folded Spill
+; CHECK-NEXT: std 24, 160(1) # 8-byte Folded Spill
+; CHECK-NEXT: ld 5, 856(1)
+; CHECK-NEXT: lxv 3, 0(24)
+; CHECK-NEXT: lxv 2, 0(25)
+; CHECK-NEXT: std 25, 168(1) # 8-byte Folded Spill
+; CHECK-NEXT: std 26, 176(1) # 8-byte Folded Spill
+; CHECK-NEXT: lxv 38, 0(9)
+; CHECK-NEXT: lxv 33, 0(10)
+; CHECK-NEXT: std 12, 136(1) # 8-byte Folded Spill
+; CHECK-NEXT: std 31, 144(1) # 8-byte Folded Spill
+; CHECK-NEXT: rldicl 3, 3, 61, 3
+; CHECK-NEXT: lxv 32, 0(11)
+; CHECK-NEXT: lxv 37, 0(29)
+; CHECK-NEXT: mr 8, 11
+; CHECK-NEXT: std 27, 184(1) # 8-byte Folded Spill
+; CHECK-NEXT: std 30, 192(1) # 8-byte Folded Spill
+; CHECK-NEXT: lxv 36, 0(14)
+; CHECK-NEXT: lxv 13, 0(15)
+; CHECK-NEXT: stfd 28, 576(1) # 8-byte Folded Spill
+; CHECK-NEXT: stfd 29, 584(1) # 8-byte Folded Spill
+; CHECK-NEXT: lxv 12, 0(16)
+; CHECK-NEXT: lxv 11, 0(17)
+; CHECK-NEXT: stfd 30, 592(1) # 8-byte Folded Spill
+; CHECK-NEXT: stfd 31, 600(1) # 8-byte Folded Spill
+; CHECK-NEXT: lxv 10, 0(18)
+; CHECK-NEXT: lxv 9, 0(19)
+; CHECK-NEXT: stxv 52, 224(1) # 16-byte Folded Spill
+; CHECK-NEXT: stxv 53, 240(1) # 16-byte Folded Spill
+; CHECK-NEXT: lxv 8, 0(20)
+; CHECK-NEXT: lxv 7, 0(21)
+; CHECK-NEXT: stxv 54, 256(1) # 16-byte Folded Spill
+; CHECK-NEXT: stxv 55, 272(1) # 16-byte Folded Spill
+; CHECK-NEXT: lxv 6, 0(12)
+; CHECK-NEXT: lxv 5, 0(31)
+; CHECK-NEXT: stxv 56, 288(1) # 16-byte Folded Spill
+; CHECK-NEXT: stxv 57, 304(1) # 16-byte Folded Spill
+; CHECK-NEXT: lxv 0, 0(27)
+; CHECK-NEXT: lxv 40, 0(30)
+; CHECK-NEXT: li 30, 1
+; CHECK-NEXT: stxv 58, 320(1) # 16-byte Folded Spill
+; CHECK-NEXT: stxv 59, 336(1) # 16-byte Folded Spill
+; CHECK-NEXT: lxv 41, 0(2)
+; CHECK-NEXT: std 5, 208(1) # 8-byte Folded Spill
+; CHECK-NEXT: std 2, 200(1) # 8-byte Folded Spill
+; CHECK-NEXT: lwa 5, 0(7)
+; CHECK-NEXT: addi 7, 3, 1
+; CHECK-NEXT: mulli 3, 5, 40
+; CHECK-NEXT: extswsli 6, 5, 3
+; CHECK-NEXT: mulli 31, 5, 48
+; CHECK-NEXT: add 0, 28, 6
+; CHECK-NEXT: ld 6, 208(1) # 8-byte Folded Reload
+; CHECK-NEXT: add 23, 28, 3
+; CHECK-NEXT: sldi 3, 5, 4
+; CHECK-NEXT: stxv 60, 352(1) # 16-byte Folded Spill
+; CHECK-NEXT: stxv 61, 368(1) # 16-byte Folded Spill
+; CHECK-NEXT: stxv 62, 384(1) # 16-byte Folded Spill
+; CHECK-NEXT: add 26, 28, 3
+; CHECK-NEXT: sldi 3, 5, 5
+; CHECK-NEXT: stxv 63, 400(1) # 16-byte Folded Spill
+; CHECK-NEXT: std 10, 56(1) # 8-byte Folded Spill
+; CHECK-NEXT: std 29, 64(1) # 8-byte Folded Spill
+; CHECK-NEXT: std 14, 72(1) # 8-byte Folded Spill
+; CHECK-NEXT: std 15, 80(1) # 8-byte Folded Spill
+; CHECK-NEXT: lxv 42, 0(6)
+; CHECK-NEXT: std 16, 88(1) # 8-byte Folded Spill
+; CHECK-NEXT: std 17, 96(1) # 8-byte Folded Spill
+; CHECK-NEXT: add 24, 28, 3
+; CHECK-NEXT: mulli 3, 5, 24
+; CHECK-NEXT: std 18, 104(1) # 8-byte Folded Spill
+; CHECK-NEXT: std 19, 112(1) # 8-byte Folded Spill
+; CHECK-NEXT: add 25, 28, 3
+; CHECK-NEXT: ld 3, 216(1) # 8-byte Folded Reload
+; CHECK-NEXT: std 20, 120(1) # 8-byte Folded Spill
+; CHECK-NEXT: std 21, 128(1) # 8-byte Folded Spill
+; CHECK-NEXT: lxv 43, 0(3)
; CHECK-NEXT: .p2align 4
; CHECK-NEXT: .LBB0_3: # %_loop_2_do_.lr.ph
; CHECK-NEXT: # =>This Loop Header: Depth=1
; CHECK-NEXT: # Child Loop BB0_4 Depth 2
-; CHECK-NEXT: maddld 5, 2, 27, 0
; CHECK-NEXT: mr 6, 22
-; CHECK-NEXT: mr 30, 20
-; CHECK-NEXT: mr 29, 19
+; CHECK-NEXT: mr 5, 28
+; CHECK-NEXT: mr 27, 0
+; CHECK-NEXT: mr 11, 26
+; CHECK-NEXT: mr 2, 25
+; CHECK-NEXT: mr 12, 24
+; CHECK-NEXT: mr 3, 23
; CHECK-NEXT: mtctr 7
-; CHECK-NEXT: add 25, 21, 5
-; CHECK-NEXT: maddld 5, 2, 27, 14
-; CHECK-NEXT: add 24, 21, 5
-; CHECK-NEXT: maddld 5, 2, 27, 31
-; CHECK-NEXT: add 23, 21, 5
-; CHECK-NEXT: mr 5, 26
; CHECK-NEXT: .p2align 5
; CHECK-NEXT: .LBB0_4: # %_loop_2_do_
; CHECK-NEXT: # Parent Loop BB0_3 Depth=1
; CHECK-NEXT: # => This Inner Loop Header: Depth=2
; CHECK-NEXT: lxvp 34, 0(6)
; CHECK-NEXT: lxvp 44, 0(5)
-; CHECK-NEXT: xvmaddadp 1, 45, 35
-; CHECK-NEXT: lxvp 46, 0(30)
-; CHECK-NEXT: xvmaddadp 0, 47, 35
-; CHECK-NEXT: lxvp 48, 0(29)
-; CHECK-NEXT: lxvp 50, 0(23)
-; CHECK-NEXT: ...
[truncated]
|
|
@llvm/pr-subscribers-backend-risc-v Author: John Brawn (john-brawn-arm) ChangesThe order in which NarrowSearchSpaceByCollapsingUnrolledCode iterates through the Uses array determines which LSRUses get deleted, with earlier uses being deleted and collapsed into later ones. The Uses array is generated from IVUsers which places later uses earlier in the array. Currently we iterate forward through the array, so the later uses are deleted and we end up with earlier uses. However we also delete elements by swapping with the last element which changes the order, meaning we can end up with a use in the middle of the loop being the final one. This is bad if we end up with a postincrement solution, as the value before postincrement will still be used later so we needs to be kept live in a register. Fix this by iterating backwards through the Uses array, which means that the last use will be the one that is kept, and we don't have the order changing as uses get deleted. Patch is 60.34 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/172314.diff 14 Files Affected:
diff --git a/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp b/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
index e12caa2136962..2b0e98c2fcfd8 100644
--- a/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
@@ -4957,7 +4957,7 @@ void LSRInstance::NarrowSearchSpaceByCollapsingUnrolledCode() {
// This is especially useful for unrolled loops.
- for (size_t LUIdx = 0, NumUses = Uses.size(); LUIdx != NumUses; ++LUIdx) {
+ for (ssize_t LUIdx = Uses.size()-1; LUIdx >= 0; --LUIdx) {
LSRUse &LU = Uses[LUIdx];
for (const Formula &F : LU.Formulae) {
if (F.BaseOffset.isZero() || (F.Scale != 0 && F.Scale != 1))
@@ -5002,8 +5002,6 @@ void LSRInstance::NarrowSearchSpaceByCollapsingUnrolledCode() {
// Delete the old use.
DeleteUse(LU, LUIdx);
- --LUIdx;
- --NumUses;
break;
}
}
diff --git a/llvm/test/CodeGen/ARM/loop-indexing.ll b/llvm/test/CodeGen/ARM/loop-indexing.ll
index bb859b202bbc0..62fafc53e5e86 100644
--- a/llvm/test/CodeGen/ARM/loop-indexing.ll
+++ b/llvm/test/CodeGen/ARM/loop-indexing.ll
@@ -68,12 +68,11 @@ exit:
}
; CHECK-LABEL: convolve_16bit
-; TODO: Both arrays should use indexing
; CHECK-DEFAULT: ldr{{.*}}, #8]!
-; CHECK-DEFAULT-NOT: ldr{{.*}}]!
+; CHECK-DEFAULT: ldr{{.*}}, #8]!
; CHECK-COMPLEX: ldr{{.*}}, #8]!
-; CHECK-COMPLEX-NOT: ldr{{.*}}]!
+; CHECK-COMPLEX: ldr{{.*}}, #8]!
; DISABLED-NOT: ldr{{.*}}]!
; DISABLED-NOT: str{{.*}}]!
diff --git a/llvm/test/CodeGen/PowerPC/dform-pair-load-store.ll b/llvm/test/CodeGen/PowerPC/dform-pair-load-store.ll
index f5ae9a20a4ee0..030acb382bb5a 100644
--- a/llvm/test/CodeGen/PowerPC/dform-pair-load-store.ll
+++ b/llvm/test/CodeGen/PowerPC/dform-pair-load-store.ll
@@ -16,8 +16,8 @@ define void @foo(i32 zeroext %n, ptr %ptr, ptr %ptr2) {
; CHECK-NEXT: cmplwi r3, 0
; CHECK-NEXT: beqlr cr0
; CHECK-NEXT: # %bb.1: # %for.body.lr.ph
-; CHECK-NEXT: addi r4, r4, 64
; CHECK-NEXT: addi r5, r5, 64
+; CHECK-NEXT: addi r4, r4, 64
; CHECK-NEXT: mtctr r3
; CHECK-NEXT: .p2align 4
; CHECK-NEXT: .LBB0_2: # %for.body
@@ -41,8 +41,8 @@ define void @foo(i32 zeroext %n, ptr %ptr, ptr %ptr2) {
; CHECK-BE-NEXT: cmplwi r3, 0
; CHECK-BE-NEXT: beqlr cr0
; CHECK-BE-NEXT: # %bb.1: # %for.body.lr.ph
-; CHECK-BE-NEXT: addi r4, r4, 64
; CHECK-BE-NEXT: addi r5, r5, 64
+; CHECK-BE-NEXT: addi r4, r4, 64
; CHECK-BE-NEXT: mtctr r3
; CHECK-BE-NEXT: .p2align 4
; CHECK-BE-NEXT: .LBB0_2: # %for.body
diff --git a/llvm/test/CodeGen/PowerPC/lsr-profitable-chain.ll b/llvm/test/CodeGen/PowerPC/lsr-profitable-chain.ll
index 79f2ef3e3746a..7508ac12e9b46 100644
--- a/llvm/test/CodeGen/PowerPC/lsr-profitable-chain.ll
+++ b/llvm/test/CodeGen/PowerPC/lsr-profitable-chain.ll
@@ -8,31 +8,31 @@ define void @foo(ptr readonly %0, ptr %1, i64 %2, i64 %3, i64 %4, i64 %5, i64 %6
; CHECK-NEXT: cmpd 5, 7
; CHECK-NEXT: bgelr 0
; CHECK-NEXT: # %bb.1: # %.preheader
+; CHECK-NEXT: addi 12, 5, 3
; CHECK-NEXT: std 27, -40(1) # 8-byte Folded Spill
; CHECK-NEXT: addi 27, 5, 2
+; CHECK-NEXT: std 29, -24(1) # 8-byte Folded Spill
+; CHECK-NEXT: addi 29, 5, 1
+; CHECK-NEXT: addi 11, 3, 16
; CHECK-NEXT: std 28, -32(1) # 8-byte Folded Spill
-; CHECK-NEXT: addi 28, 5, 3
-; CHECK-NEXT: std 30, -16(1) # 8-byte Folded Spill
-; CHECK-NEXT: addi 30, 5, 1
-; CHECK-NEXT: mulld 12, 8, 5
; CHECK-NEXT: mulld 0, 9, 8
-; CHECK-NEXT: std 29, -24(1) # 8-byte Folded Spill
-; CHECK-NEXT: addi 29, 3, 16
-; CHECK-NEXT: sldi 11, 10, 3
+; CHECK-NEXT: mulld 28, 8, 5
+; CHECK-NEXT: std 30, -16(1) # 8-byte Folded Spill
+; CHECK-NEXT: sldi 30, 10, 3
; CHECK-NEXT: std 22, -80(1) # 8-byte Folded Spill
; CHECK-NEXT: std 23, -72(1) # 8-byte Folded Spill
; CHECK-NEXT: std 24, -64(1) # 8-byte Folded Spill
; CHECK-NEXT: std 25, -56(1) # 8-byte Folded Spill
; CHECK-NEXT: std 26, -48(1) # 8-byte Folded Spill
-; CHECK-NEXT: mulld 30, 8, 30
-; CHECK-NEXT: mulld 28, 8, 28
+; CHECK-NEXT: mulld 12, 8, 12
+; CHECK-NEXT: mulld 29, 8, 29
; CHECK-NEXT: mulld 8, 8, 27
; CHECK-NEXT: b .LBB0_3
; CHECK-NEXT: .p2align 4
; CHECK-NEXT: .LBB0_2:
; CHECK-NEXT: add 5, 5, 9
; CHECK-NEXT: add 12, 12, 0
-; CHECK-NEXT: add 30, 30, 0
+; CHECK-NEXT: add 29, 29, 0
; CHECK-NEXT: add 28, 28, 0
; CHECK-NEXT: add 8, 8, 0
; CHECK-NEXT: cmpd 5, 7
@@ -43,24 +43,24 @@ define void @foo(ptr readonly %0, ptr %1, i64 %2, i64 %3, i64 %4, i64 %5, i64 %6
; CHECK-NEXT: cmpd 6, 27
; CHECK-NEXT: bge 0, .LBB0_2
; CHECK-NEXT: # %bb.4:
-; CHECK-NEXT: add 25, 6, 12
+; CHECK-NEXT: add 24, 6, 28
+; CHECK-NEXT: add 26, 6, 12
+; CHECK-NEXT: add 25, 6, 29
+; CHECK-NEXT: sldi 23, 24, 3
; CHECK-NEXT: add 24, 6, 8
-; CHECK-NEXT: sldi 26, 6, 3
-; CHECK-NEXT: sldi 23, 25, 3
-; CHECK-NEXT: add 25, 6, 30
-; CHECK-NEXT: sldi 24, 24, 3
-; CHECK-NEXT: add 26, 4, 26
+; CHECK-NEXT: sldi 26, 26, 3
; CHECK-NEXT: sldi 22, 25, 3
-; CHECK-NEXT: add 25, 6, 28
-; CHECK-NEXT: add 24, 29, 24
+; CHECK-NEXT: sldi 25, 6, 3
+; CHECK-NEXT: sldi 24, 24, 3
+; CHECK-NEXT: add 26, 11, 26
+; CHECK-NEXT: add 25, 4, 25
; CHECK-NEXT: add 23, 3, 23
-; CHECK-NEXT: sldi 25, 25, 3
; CHECK-NEXT: add 22, 3, 22
-; CHECK-NEXT: add 25, 29, 25
+; CHECK-NEXT: add 24, 11, 24
; CHECK-NEXT: .p2align 5
; CHECK-NEXT: .LBB0_5: # Parent Loop BB0_3 Depth=1
; CHECK-NEXT: # => This Inner Loop Header: Depth=2
-; CHECK-NEXT: lfd 0, 0(26)
+; CHECK-NEXT: lfd 0, 0(25)
; CHECK-NEXT: lfd 1, 0(23)
; CHECK-NEXT: add 6, 6, 10
; CHECK-NEXT: cmpd 6, 27
@@ -70,7 +70,7 @@ define void @foo(ptr readonly %0, ptr %1, i64 %2, i64 %3, i64 %4, i64 %5, i64 %6
; CHECK-NEXT: lfd 1, 16(23)
; CHECK-NEXT: xsadddp 0, 0, 1
; CHECK-NEXT: lfd 1, 24(23)
-; CHECK-NEXT: add 23, 23, 11
+; CHECK-NEXT: add 23, 23, 30
; CHECK-NEXT: xsadddp 0, 0, 1
; CHECK-NEXT: lfd 1, 0(22)
; CHECK-NEXT: xsadddp 0, 0, 1
@@ -79,7 +79,7 @@ define void @foo(ptr readonly %0, ptr %1, i64 %2, i64 %3, i64 %4, i64 %5, i64 %6
; CHECK-NEXT: lfd 1, 16(22)
; CHECK-NEXT: xsadddp 0, 0, 1
; CHECK-NEXT: lfd 1, 24(22)
-; CHECK-NEXT: add 22, 22, 11
+; CHECK-NEXT: add 22, 22, 30
; CHECK-NEXT: xsadddp 0, 0, 1
; CHECK-NEXT: lfd 1, -16(24)
; CHECK-NEXT: xsadddp 0, 0, 1
@@ -88,19 +88,19 @@ define void @foo(ptr readonly %0, ptr %1, i64 %2, i64 %3, i64 %4, i64 %5, i64 %6
; CHECK-NEXT: lfd 1, 0(24)
; CHECK-NEXT: xsadddp 0, 0, 1
; CHECK-NEXT: lfd 1, 8(24)
-; CHECK-NEXT: add 24, 24, 11
+; CHECK-NEXT: add 24, 24, 30
; CHECK-NEXT: xsadddp 0, 0, 1
-; CHECK-NEXT: lfd 1, -16(25)
+; CHECK-NEXT: lfd 1, -16(26)
; CHECK-NEXT: xsadddp 0, 0, 1
-; CHECK-NEXT: lfd 1, -8(25)
+; CHECK-NEXT: lfd 1, -8(26)
; CHECK-NEXT: xsadddp 0, 0, 1
-; CHECK-NEXT: lfd 1, 0(25)
+; CHECK-NEXT: lfd 1, 0(26)
; CHECK-NEXT: xsadddp 0, 0, 1
-; CHECK-NEXT: lfd 1, 8(25)
-; CHECK-NEXT: add 25, 25, 11
+; CHECK-NEXT: lfd 1, 8(26)
+; CHECK-NEXT: add 26, 26, 30
; CHECK-NEXT: xsadddp 0, 0, 1
-; CHECK-NEXT: stfd 0, 0(26)
-; CHECK-NEXT: add 26, 26, 11
+; CHECK-NEXT: stfd 0, 0(25)
+; CHECK-NEXT: add 25, 25, 30
; CHECK-NEXT: blt 0, .LBB0_5
; CHECK-NEXT: b .LBB0_2
; CHECK-NEXT: .LBB0_6:
diff --git a/llvm/test/CodeGen/PowerPC/more-dq-form-prepare.ll b/llvm/test/CodeGen/PowerPC/more-dq-form-prepare.ll
index af0942e99182d..5c7eb283aa6f0 100644
--- a/llvm/test/CodeGen/PowerPC/more-dq-form-prepare.ll
+++ b/llvm/test/CodeGen/PowerPC/more-dq-form-prepare.ll
@@ -18,8 +18,8 @@ define void @foo(ptr %.m, ptr %.n, ptr %.a, ptr %.x, ptr %.l, ptr %.vy01, ptr %.
; CHECK-NEXT: cmpwi 3, 1
; CHECK-NEXT: bltlr 0
; CHECK-NEXT: # %bb.2: # %_loop_1_do_.preheader
-; CHECK-NEXT: stdu 1, -592(1)
-; CHECK-NEXT: .cfi_def_cfa_offset 592
+; CHECK-NEXT: stdu 1, -608(1)
+; CHECK-NEXT: .cfi_def_cfa_offset 608
; CHECK-NEXT: .cfi_offset r14, -192
; CHECK-NEXT: .cfi_offset r15, -184
; CHECK-NEXT: .cfi_offset r16, -176
@@ -56,300 +56,293 @@ define void @foo(ptr %.m, ptr %.n, ptr %.a, ptr %.x, ptr %.l, ptr %.vy01, ptr %.
; CHECK-NEXT: .cfi_offset v29, -240
; CHECK-NEXT: .cfi_offset v30, -224
; CHECK-NEXT: .cfi_offset v31, -208
-; CHECK-NEXT: std 14, 400(1) # 8-byte Folded Spill
-; CHECK-NEXT: std 15, 408(1) # 8-byte Folded Spill
-; CHECK-NEXT: ld 2, 728(1)
-; CHECK-NEXT: ld 14, 688(1)
-; CHECK-NEXT: ld 11, 704(1)
-; CHECK-NEXT: std 20, 448(1) # 8-byte Folded Spill
-; CHECK-NEXT: std 21, 456(1) # 8-byte Folded Spill
-; CHECK-NEXT: mr 21, 5
-; CHECK-NEXT: lwa 5, 0(7)
-; CHECK-NEXT: ld 7, 720(1)
-; CHECK-NEXT: std 22, 464(1) # 8-byte Folded Spill
-; CHECK-NEXT: std 23, 472(1) # 8-byte Folded Spill
-; CHECK-NEXT: mr 22, 6
-; CHECK-NEXT: ld 6, 848(1)
+; CHECK-NEXT: std 28, 528(1) # 8-byte Folded Spill
+; CHECK-NEXT: std 29, 536(1) # 8-byte Folded Spill
+; CHECK-NEXT: mr 28, 5
+; CHECK-NEXT: ld 5, 864(1)
; CHECK-NEXT: addi 3, 3, 1
-; CHECK-NEXT: ld 15, 736(1)
-; CHECK-NEXT: std 18, 432(1) # 8-byte Folded Spill
-; CHECK-NEXT: std 19, 440(1) # 8-byte Folded Spill
-; CHECK-NEXT: ld 19, 768(1)
-; CHECK-NEXT: ld 18, 760(1)
-; CHECK-NEXT: std 30, 528(1) # 8-byte Folded Spill
-; CHECK-NEXT: std 31, 536(1) # 8-byte Folded Spill
-; CHECK-NEXT: ld 12, 696(1)
-; CHECK-NEXT: lxv 0, 0(9)
-; CHECK-NEXT: std 9, 64(1) # 8-byte Folded Spill
-; CHECK-NEXT: std 10, 72(1) # 8-byte Folded Spill
-; CHECK-NEXT: lxv 1, 0(8)
+; CHECK-NEXT: ld 2, 848(1)
+; CHECK-NEXT: ld 12, 784(1)
+; CHECK-NEXT: std 22, 480(1) # 8-byte Folded Spill
+; CHECK-NEXT: std 23, 488(1) # 8-byte Folded Spill
+; CHECK-NEXT: mr 22, 6
+; CHECK-NEXT: li 6, 9
+; CHECK-NEXT: ld 23, 800(1)
+; CHECK-NEXT: ld 29, 712(1)
+; CHECK-NEXT: std 24, 496(1) # 8-byte Folded Spill
+; CHECK-NEXT: std 25, 504(1) # 8-byte Folded Spill
+; CHECK-NEXT: ld 25, 816(1)
; CHECK-NEXT: cmpldi 3, 9
-; CHECK-NEXT: ld 30, 824(1)
-; CHECK-NEXT: std 28, 512(1) # 8-byte Folded Spill
-; CHECK-NEXT: std 29, 520(1) # 8-byte Folded Spill
-; CHECK-NEXT: ld 29, 840(1)
-; CHECK-NEXT: ld 28, 832(1)
-; CHECK-NEXT: std 16, 416(1) # 8-byte Folded Spill
-; CHECK-NEXT: std 17, 424(1) # 8-byte Folded Spill
-; CHECK-NEXT: ld 23, 784(1)
-; CHECK-NEXT: ld 20, 776(1)
-; CHECK-NEXT: std 24, 480(1) # 8-byte Folded Spill
-; CHECK-NEXT: std 25, 488(1) # 8-byte Folded Spill
-; CHECK-NEXT: ld 25, 800(1)
-; CHECK-NEXT: ld 24, 792(1)
-; CHECK-NEXT: std 26, 496(1) # 8-byte Folded Spill
-; CHECK-NEXT: std 27, 504(1) # 8-byte Folded Spill
-; CHECK-NEXT: ld 27, 816(1)
-; CHECK-NEXT: ld 26, 808(1)
-; CHECK-NEXT: stfd 26, 544(1) # 8-byte Folded Spill
-; CHECK-NEXT: stfd 27, 552(1) # 8-byte Folded Spill
-; CHECK-NEXT: ld 17, 752(1)
-; CHECK-NEXT: extswsli 9, 5, 3
-; CHECK-NEXT: lxv 4, 0(14)
-; CHECK-NEXT: std 14, 32(1) # 8-byte Folded Spill
-; CHECK-NEXT: std 12, 40(1) # 8-byte Folded Spill
-; CHECK-NEXT: mulli 0, 5, 40
-; CHECK-NEXT: sldi 14, 5, 5
-; CHECK-NEXT: mulli 31, 5, 24
-; CHECK-NEXT: lxv 38, 0(2)
-; CHECK-NEXT: lxv 2, 0(11)
-; CHECK-NEXT: std 2, 80(1) # 8-byte Folded Spill
-; CHECK-NEXT: std 15, 88(1) # 8-byte Folded Spill
-; CHECK-NEXT: mulli 2, 5, 48
-; CHECK-NEXT: sldi 5, 5, 4
-; CHECK-NEXT: ld 16, 744(1)
-; CHECK-NEXT: lxv 5, 0(10)
-; CHECK-NEXT: std 6, 200(1) # 8-byte Folded Spill
-; CHECK-NEXT: std 29, 192(1) # 8-byte Folded Spill
-; CHECK-NEXT: ld 6, 712(1)
-; CHECK-NEXT: mr 10, 7
-; CHECK-NEXT: add 7, 14, 21
-; CHECK-NEXT: lxv 13, 0(19)
-; CHECK-NEXT: std 8, 48(1) # 8-byte Folded Spill
-; CHECK-NEXT: std 6, 56(1) # 8-byte Folded Spill
-; CHECK-NEXT: mr 8, 11
-; CHECK-NEXT: li 11, 9
-; CHECK-NEXT: iselgt 3, 3, 11
+; CHECK-NEXT: ld 24, 808(1)
+; CHECK-NEXT: std 26, 512(1) # 8-byte Folded Spill
+; CHECK-NEXT: std 27, 520(1) # 8-byte Folded Spill
+; CHECK-NEXT: ld 26, 824(1)
+; CHECK-NEXT: ld 27, 832(1)
+; CHECK-NEXT: std 14, 416(1) # 8-byte Folded Spill
+; CHECK-NEXT: std 15, 424(1) # 8-byte Folded Spill
+; CHECK-NEXT: ld 15, 728(1)
+; CHECK-NEXT: ld 14, 720(1)
+; CHECK-NEXT: std 16, 432(1) # 8-byte Folded Spill
+; CHECK-NEXT: std 17, 440(1) # 8-byte Folded Spill
+; CHECK-NEXT: ld 17, 744(1)
+; CHECK-NEXT: ld 16, 736(1)
+; CHECK-NEXT: std 18, 448(1) # 8-byte Folded Spill
+; CHECK-NEXT: std 19, 456(1) # 8-byte Folded Spill
+; CHECK-NEXT: iselgt 3, 3, 6
+; CHECK-NEXT: ld 19, 760(1)
+; CHECK-NEXT: ld 18, 752(1)
+; CHECK-NEXT: std 20, 464(1) # 8-byte Folded Spill
+; CHECK-NEXT: std 21, 472(1) # 8-byte Folded Spill
+; CHECK-NEXT: ld 21, 776(1)
+; CHECK-NEXT: ld 20, 768(1)
+; CHECK-NEXT: std 30, 544(1) # 8-byte Folded Spill
+; CHECK-NEXT: std 31, 552(1) # 8-byte Folded Spill
+; CHECK-NEXT: ld 30, 840(1)
+; CHECK-NEXT: ld 31, 792(1)
+; CHECK-NEXT: std 8, 40(1) # 8-byte Folded Spill
+; CHECK-NEXT: std 9, 48(1) # 8-byte Folded Spill
+; CHECK-NEXT: ld 11, 704(1)
+; CHECK-NEXT: lxv 39, 0(8)
+; CHECK-NEXT: stfd 26, 560(1) # 8-byte Folded Spill
+; CHECK-NEXT: stfd 27, 568(1) # 8-byte Folded Spill
; CHECK-NEXT: addi 3, 3, -2
-; CHECK-NEXT: rldicl 11, 3, 61, 3
-; CHECK-NEXT: lxv 3, 0(12)
-; CHECK-NEXT: lxv 40, 0(6)
-; CHECK-NEXT: std 18, 112(1) # 8-byte Folded Spill
-; CHECK-NEXT: std 19, 120(1) # 8-byte Folded Spill
-; CHECK-NEXT: add 19, 21, 5
-; CHECK-NEXT: ld 5, 200(1) # 8-byte Folded Reload
-; CHECK-NEXT: lxv 39, 0(10)
-; CHECK-NEXT: addi 3, 7, 32
-; CHECK-NEXT: add 12, 31, 21
-; CHECK-NEXT: std 20, 128(1) # 8-byte Folded Spill
-; CHECK-NEXT: std 23, 136(1) # 8-byte Folded Spill
-; CHECK-NEXT: lxv 33, 0(15)
-; CHECK-NEXT: lxv 32, 0(16)
-; CHECK-NEXT: std 26, 160(1) # 8-byte Folded Spill
-; CHECK-NEXT: std 27, 168(1) # 8-byte Folded Spill
-; CHECK-NEXT: lxv 37, 0(17)
-; CHECK-NEXT: lxv 36, 0(18)
-; CHECK-NEXT: std 30, 176(1) # 8-byte Folded Spill
-; CHECK-NEXT: std 28, 184(1) # 8-byte Folded Spill
-; CHECK-NEXT: lxv 12, 0(20)
-; CHECK-NEXT: lxv 11, 0(23)
-; CHECK-NEXT: add 20, 21, 9
-; CHECK-NEXT: stfd 28, 560(1) # 8-byte Folded Spill
-; CHECK-NEXT: stfd 29, 568(1) # 8-byte Folded Spill
-; CHECK-NEXT: lxv 10, 0(24)
-; CHECK-NEXT: lxv 9, 0(25)
-; CHECK-NEXT: stfd 30, 576(1) # 8-byte Folded Spill
-; CHECK-NEXT: stfd 31, 584(1) # 8-byte Folded Spill
-; CHECK-NEXT: lxv 8, 0(26)
-; CHECK-NEXT: lxv 7, 0(27)
-; CHECK-NEXT: addi 12, 12, 32
-; CHECK-NEXT: li 27, 0
-; CHECK-NEXT: mr 26, 21
-; CHECK-NEXT: stxv 52, 208(1) # 16-byte Folded Spill
-; CHECK-NEXT: stxv 53, 224(1) # 16-byte Folded Spill
-; CHECK-NEXT: lxv 6, 0(30)
-; CHECK-NEXT: lxv 41, 0(28)
-; CHECK-NEXT: addi 7, 11, 1
-; CHECK-NEXT: add 11, 0, 21
-; CHECK-NEXT: li 28, 1
-; CHECK-NEXT: stxv 54, 240(1) # 16-byte Folded Spill
-; CHECK-NEXT: stxv 55, 256(1) # 16-byte Folded Spill
-; CHECK-NEXT: lxv 43, 0(29)
-; CHECK-NEXT: lxv 42, 0(5)
-; CHECK-NEXT: stxv 56, 272(1) # 16-byte Folded Spill
-; CHECK-NEXT: stxv 57, 288(1) # 16-byte Folded Spill
-; CHECK-NEXT: addi 11, 11, 32
-; CHECK-NEXT: stxv 58, 304(1) # 16-byte Folded Spill
-; CHECK-NEXT: stxv 59, 320(1) # 16-byte Folded Spill
-; CHECK-NEXT: stxv 60, 336(1) # 16-byte Folded Spill
-; CHECK-NEXT: stxv 61, 352(1) # 16-byte Folded Spill
-; CHECK-NEXT: stxv 62, 368(1) # 16-byte Folded Spill
-; CHECK-NEXT: stxv 63, 384(1) # 16-byte Folded Spill
-; CHECK-NEXT: std 16, 96(1) # 8-byte Folded Spill
-; CHECK-NEXT: std 17, 104(1) # 8-byte Folded Spill
-; CHECK-NEXT: std 24, 144(1) # 8-byte Folded Spill
-; CHECK-NEXT: std 25, 152(1) # 8-byte Folded Spill
+; CHECK-NEXT: lxv 4, 0(23)
+; CHECK-NEXT: lxv 1, 0(26)
+; CHECK-NEXT: std 5, 216(1) # 8-byte Folded Spill
+; CHECK-NEXT: std 23, 152(1) # 8-byte Folded Spill
+; CHECK-NEXT: std 24, 160(1) # 8-byte Folded Spill
+; CHECK-NEXT: ld 5, 856(1)
+; CHECK-NEXT: lxv 3, 0(24)
+; CHECK-NEXT: lxv 2, 0(25)
+; CHECK-NEXT: std 25, 168(1) # 8-byte Folded Spill
+; CHECK-NEXT: std 26, 176(1) # 8-byte Folded Spill
+; CHECK-NEXT: lxv 38, 0(9)
+; CHECK-NEXT: lxv 33, 0(10)
+; CHECK-NEXT: std 12, 136(1) # 8-byte Folded Spill
+; CHECK-NEXT: std 31, 144(1) # 8-byte Folded Spill
+; CHECK-NEXT: rldicl 3, 3, 61, 3
+; CHECK-NEXT: lxv 32, 0(11)
+; CHECK-NEXT: lxv 37, 0(29)
+; CHECK-NEXT: mr 8, 11
+; CHECK-NEXT: std 27, 184(1) # 8-byte Folded Spill
+; CHECK-NEXT: std 30, 192(1) # 8-byte Folded Spill
+; CHECK-NEXT: lxv 36, 0(14)
+; CHECK-NEXT: lxv 13, 0(15)
+; CHECK-NEXT: stfd 28, 576(1) # 8-byte Folded Spill
+; CHECK-NEXT: stfd 29, 584(1) # 8-byte Folded Spill
+; CHECK-NEXT: lxv 12, 0(16)
+; CHECK-NEXT: lxv 11, 0(17)
+; CHECK-NEXT: stfd 30, 592(1) # 8-byte Folded Spill
+; CHECK-NEXT: stfd 31, 600(1) # 8-byte Folded Spill
+; CHECK-NEXT: lxv 10, 0(18)
+; CHECK-NEXT: lxv 9, 0(19)
+; CHECK-NEXT: stxv 52, 224(1) # 16-byte Folded Spill
+; CHECK-NEXT: stxv 53, 240(1) # 16-byte Folded Spill
+; CHECK-NEXT: lxv 8, 0(20)
+; CHECK-NEXT: lxv 7, 0(21)
+; CHECK-NEXT: stxv 54, 256(1) # 16-byte Folded Spill
+; CHECK-NEXT: stxv 55, 272(1) # 16-byte Folded Spill
+; CHECK-NEXT: lxv 6, 0(12)
+; CHECK-NEXT: lxv 5, 0(31)
+; CHECK-NEXT: stxv 56, 288(1) # 16-byte Folded Spill
+; CHECK-NEXT: stxv 57, 304(1) # 16-byte Folded Spill
+; CHECK-NEXT: lxv 0, 0(27)
+; CHECK-NEXT: lxv 40, 0(30)
+; CHECK-NEXT: li 30, 1
+; CHECK-NEXT: stxv 58, 320(1) # 16-byte Folded Spill
+; CHECK-NEXT: stxv 59, 336(1) # 16-byte Folded Spill
+; CHECK-NEXT: lxv 41, 0(2)
+; CHECK-NEXT: std 5, 208(1) # 8-byte Folded Spill
+; CHECK-NEXT: std 2, 200(1) # 8-byte Folded Spill
+; CHECK-NEXT: lwa 5, 0(7)
+; CHECK-NEXT: addi 7, 3, 1
+; CHECK-NEXT: mulli 3, 5, 40
+; CHECK-NEXT: extswsli 6, 5, 3
+; CHECK-NEXT: mulli 31, 5, 48
+; CHECK-NEXT: add 0, 28, 6
+; CHECK-NEXT: ld 6, 208(1) # 8-byte Folded Reload
+; CHECK-NEXT: add 23, 28, 3
+; CHECK-NEXT: sldi 3, 5, 4
+; CHECK-NEXT: stxv 60, 352(1) # 16-byte Folded Spill
+; CHECK-NEXT: stxv 61, 368(1) # 16-byte Folded Spill
+; CHECK-NEXT: stxv 62, 384(1) # 16-byte Folded Spill
+; CHECK-NEXT: add 26, 28, 3
+; CHECK-NEXT: sldi 3, 5, 5
+; CHECK-NEXT: stxv 63, 400(1) # 16-byte Folded Spill
+; CHECK-NEXT: std 10, 56(1) # 8-byte Folded Spill
+; CHECK-NEXT: std 29, 64(1) # 8-byte Folded Spill
+; CHECK-NEXT: std 14, 72(1) # 8-byte Folded Spill
+; CHECK-NEXT: std 15, 80(1) # 8-byte Folded Spill
+; CHECK-NEXT: lxv 42, 0(6)
+; CHECK-NEXT: std 16, 88(1) # 8-byte Folded Spill
+; CHECK-NEXT: std 17, 96(1) # 8-byte Folded Spill
+; CHECK-NEXT: add 24, 28, 3
+; CHECK-NEXT: mulli 3, 5, 24
+; CHECK-NEXT: std 18, 104(1) # 8-byte Folded Spill
+; CHECK-NEXT: std 19, 112(1) # 8-byte Folded Spill
+; CHECK-NEXT: add 25, 28, 3
+; CHECK-NEXT: ld 3, 216(1) # 8-byte Folded Reload
+; CHECK-NEXT: std 20, 120(1) # 8-byte Folded Spill
+; CHECK-NEXT: std 21, 128(1) # 8-byte Folded Spill
+; CHECK-NEXT: lxv 43, 0(3)
; CHECK-NEXT: .p2align 4
; CHECK-NEXT: .LBB0_3: # %_loop_2_do_.lr.ph
; CHECK-NEXT: # =>This Loop Header: Depth=1
; CHECK-NEXT: # Child Loop BB0_4 Depth 2
-; CHECK-NEXT: maddld 5, 2, 27, 0
; CHECK-NEXT: mr 6, 22
-; CHECK-NEXT: mr 30, 20
-; CHECK-NEXT: mr 29, 19
+; CHECK-NEXT: mr 5, 28
+; CHECK-NEXT: mr 27, 0
+; CHECK-NEXT: mr 11, 26
+; CHECK-NEXT: mr 2, 25
+; CHECK-NEXT: mr 12, 24
+; CHECK-NEXT: mr 3, 23
; CHECK-NEXT: mtctr 7
-; CHECK-NEXT: add 25, 21, 5
-; CHECK-NEXT: maddld 5, 2, 27, 14
-; CHECK-NEXT: add 24, 21, 5
-; CHECK-NEXT: maddld 5, 2, 27, 31
-; CHECK-NEXT: add 23, 21, 5
-; CHECK-NEXT: mr 5, 26
; CHECK-NEXT: .p2align 5
; CHECK-NEXT: .LBB0_4: # %_loop_2_do_
; CHECK-NEXT: # Parent Loop BB0_3 Depth=1
; CHECK-NEXT: # => This Inner Loop Header: Depth=2
; CHECK-NEXT: lxvp 34, 0(6)
; CHECK-NEXT: lxvp 44, 0(5)
-; CHECK-NEXT: xvmaddadp 1, 45, 35
-; CHECK-NEXT: lxvp 46, 0(30)
-; CHECK-NEXT: xvmaddadp 0, 47, 35
-; CHECK-NEXT: lxvp 48, 0(29)
-; CHECK-NEXT: lxvp 50, 0(23)
-; CHECK-NEXT: ...
[truncated]
|
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
🪟 Windows x64 Test Results
✅ The build succeeded and all tests passed. |
🐧 Linux x64 Test Results
✅ The build succeeded and all tests passed. |
The order in which NarrowSearchSpaceByCollapsingUnrolledCode iterates through the Uses array determines which LSRUses get deleted, with earlier uses being deleted and collapsed into later ones.
The Uses array is generated from IVUsers which places later uses earlier in the array. Currently we iterate forward through the array, so the later uses are deleted and we end up with earlier uses. However we also delete elements by swapping with the last element which changes the order, meaning we can end up with a use in the middle of the loop being the final one. This is bad if we end up with a postincrement solution, as the value before postincrement will still be used later so we needs to be kept live in a register.
Fix this by iterating backwards through the Uses array, which means that the last use will be the one that is kept, and we don't have the order changing as uses get deleted.