-
Notifications
You must be signed in to change notification settings - Fork 15.6k
[GVN] Support rnflow pattern matching and transform #162259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
You can test this locally with the following command:git-clang-format --diff origin/main HEAD --extensions h,cpp -- llvm/include/llvm/Transforms/Scalar/GVN.h llvm/lib/Transforms/Scalar/GVN.cpp --diff_from_common_commit
View the diff from clang-format here.diff --git a/llvm/include/llvm/Transforms/Scalar/GVN.h b/llvm/include/llvm/Transforms/Scalar/GVN.h
index db74c2066..ed384f930 100644
--- a/llvm/include/llvm/Transforms/Scalar/GVN.h
+++ b/llvm/include/llvm/Transforms/Scalar/GVN.h
@@ -19,10 +19,10 @@
#include "llvm/ADT/MapVector.h"
#include "llvm/ADT/SetVector.h"
#include "llvm/ADT/SmallVector.h"
+#include "llvm/Analysis/LoopInfo.h"
#include "llvm/IR/Dominators.h"
#include "llvm/IR/InstrTypes.h"
#include "llvm/IR/PassManager.h"
-#include "llvm/Analysis/LoopInfo.h"
#include "llvm/IR/ValueHandle.h"
#include "llvm/Support/Allocator.h"
#include "llvm/Support/Compiler.h"
|
|
Hi @fhahn and @nikic, The patch is functionally correct as the rnflow application passes. However, compared to trunk this patch regresses the app by ~10-12% as the new select instruction the pass is generating is more constly than hoisting the load. I played around |
In the test you've added the original IR seems to already have a select instruction - do you also see a select in rnflow prior to GVN? |
|
Sorry, please ignore me! I see there is an extra select being generated. Got it. :) |
|
Ping to reviewers! |
And this is also true with GCC (without vectorization), as IIRC it was doing this transform? I feel like I saw a comment somewhere that doing the GVN transform would improve performance slightly, but I have trouble finding it, maybe I imagined it... |
Ah, I think it was this one, on the test case PR: #141556 (comment) I guess that one was an estimated number and not a measurement. |
Yeah, that was an estimate, but unfortunately, in reality, it does not match (at least) on the Grace platform. |
|
@llvm/pr-subscribers-llvm-transforms Author: Madhur Amilkanthwar (madhur13490) ChangesThis is an alternative patch to #144987, adding support in GVN. Full diff: https://github.com/llvm/llvm-project/pull/162259.diff 3 Files Affected:
diff --git a/llvm/include/llvm/Transforms/Scalar/GVN.h b/llvm/include/llvm/Transforms/Scalar/GVN.h
index bc0f108ac8260..b886140348e79 100644
--- a/llvm/include/llvm/Transforms/Scalar/GVN.h
+++ b/llvm/include/llvm/Transforms/Scalar/GVN.h
@@ -22,6 +22,7 @@
#include "llvm/IR/Dominators.h"
#include "llvm/IR/InstrTypes.h"
#include "llvm/IR/PassManager.h"
+#include "llvm/Analysis/LoopInfo.h"
#include "llvm/IR/ValueHandle.h"
#include "llvm/Support/Allocator.h"
#include "llvm/Support/Compiler.h"
@@ -45,6 +46,7 @@ class FunctionPass;
class GetElementPtrInst;
class ImplicitControlFlowTracking;
class LoadInst;
+class SelectInst;
class LoopInfo;
class MemDepResult;
class MemoryAccess;
@@ -405,6 +407,8 @@ class GVNPass : public PassInfoMixin<GVNPass> {
void addDeadBlock(BasicBlock *BB);
void assignValNumForDeadCode();
void assignBlockRPONumber(Function &F);
+
+ bool optimizeMinMaxFindingSelectPattern(SelectInst *Select);
};
/// Create a legacy GVN pass.
diff --git a/llvm/lib/Transforms/Scalar/GVN.cpp b/llvm/lib/Transforms/Scalar/GVN.cpp
index 72e1131a54a86..f817d045862cd 100644
--- a/llvm/lib/Transforms/Scalar/GVN.cpp
+++ b/llvm/lib/Transforms/Scalar/GVN.cpp
@@ -2743,6 +2743,10 @@ bool GVNPass::processInstruction(Instruction *I) {
}
return Changed;
}
+ if (SelectInst *Select = dyn_cast<SelectInst>(I)) {
+ if (optimizeMinMaxFindingSelectPattern(Select))
+ return true;
+ }
// Instructions with void type don't return a value, so there's
// no point in trying to find redundancies in them.
@@ -3330,6 +3334,123 @@ void GVNPass::assignValNumForDeadCode() {
}
}
+bool GVNPass::optimizeMinMaxFindingSelectPattern(SelectInst *Select) {
+ LLVM_DEBUG(
+ dbgs()
+ << "GVN: Analyzing select instruction for minimum finding pattern\n");
+ LLVM_DEBUG(dbgs() << "GVN: Select: " << *Select << "\n");
+ Value *Condition = Select->getCondition();
+ CmpInst *Comparison = dyn_cast<CmpInst>(Condition);
+ if (!Comparison) {
+ LLVM_DEBUG(dbgs() << "GVN: Condition is not a comparison\n");
+ return false;
+ }
+
+ // Check if this is ULT comparison.
+ CmpInst::Predicate Pred = Comparison->getPredicate();
+ if (Pred != CmpInst::ICMP_SLT && Pred != CmpInst::ICMP_ULT &&
+ Pred != CmpInst::FCMP_OLT && Pred != CmpInst::FCMP_ULT) {
+ LLVM_DEBUG(dbgs() << "GVN: Not a less-than comparison, predicate: " << Pred
+ << "\n");
+ return false;
+ }
+
+ // Check that both operands are loads.
+ Value *LHS = Comparison->getOperand(0);
+ Value *RHS = Comparison->getOperand(1);
+ if (!isa<LoadInst>(LHS) || !isa<LoadInst>(RHS)) {
+ LLVM_DEBUG(dbgs() << "GVN: Not both operands are loads\n");
+ return false;
+ }
+
+ LLVM_DEBUG(dbgs() << "GVN: Found minimum finding pattern in Block: "
+ << Select->getParent()->getName() << "\n");
+
+ // Transform the pattern.
+ // Hoist the chain of operations for the second load to preheader.
+ // Get predecessor of the block containing the select instruction.
+ BasicBlock *BB = Select->getParent();
+
+ // Get preheader of the loop.
+ Loop *L = LI->getLoopFor(BB);
+ if (!L) {
+ LLVM_DEBUG(dbgs() << "GVN: Could not find loop\n");
+ return false;
+ }
+ BasicBlock *Preheader = L->getLoopPreheader();
+ if (!Preheader) {
+ LLVM_DEBUG(dbgs() << "GVN: Could not find loop preheader\n");
+ return false;
+ }
+
+ // Hoist the chain of operations for the second load to preheader.
+ // %90 = sext i32 %.05.i to i64
+ // %91 = getelementptr float, ptr %0, i64 %90 ; %0 + (sext i32 %85 to i64)*4
+ // %92 = getelementptr i8, ptr %91, i64 -4 ; %0 + (sext i32 %85 to i64)*4 - 4
+ // %93 = load float, ptr %92, align 4
+
+ Value *BasePtr = nullptr, *IndexVal = nullptr, *OffsetVal = nullptr;
+ IRBuilder<> Builder(Preheader->getTerminator());
+ if (match(RHS,
+ m_Load(m_GEP(m_GEP(m_Value(BasePtr), m_SExt(m_Value(IndexVal))),
+ m_Value(OffsetVal))))) {
+ LLVM_DEBUG(dbgs() << "GVN: Found pattern: " << *RHS << "\n");
+ LLVM_DEBUG(dbgs() << "GVN: Found pattern: " << "\n");
+
+ PHINode *Phi = dyn_cast<PHINode>(IndexVal);
+ if (!Phi) {
+ LLVM_DEBUG(dbgs() << "GVN: IndexVal is not a PHI node\n");
+ return false;
+ }
+ Value *InitialMinIndex = Phi->getIncomingValueForBlock(Preheader);
+
+ // Insert PHI node at the top of this block.
+ PHINode *KnownMinPhi =
+ PHINode::Create(Builder.getFloatTy(), 2, "known_min", BB->begin());
+
+ // Build the GEP chain in the preheader.
+ // 1. hoist_0 = sext i32 to i64
+ Value *HoistedSExt =
+ Builder.CreateSExt(InitialMinIndex, Builder.getInt64Ty(), "hoist_sext");
+
+ // 2. hoist_gep1 = getelementptr float, ptr BasePtr, i64 HoistedSExt
+ Value *HoistedGEP1 = Builder.CreateGEP(Builder.getFloatTy(), BasePtr,
+ HoistedSExt, "hoist_gep1");
+
+ // 3. hoist_gep2 = getelementptr i8, ptr HoistedGEP1, i64 OffsetVal
+ Value *HoistedGEP2 = Builder.CreateGEP(Builder.getInt8Ty(), HoistedGEP1,
+ OffsetVal, "hoist_gep2");
+
+ // 4. hoisted_load = load float, ptr HoistedGEP2
+ LoadInst *NewLoad =
+ Builder.CreateLoad(Builder.getFloatTy(), HoistedGEP2, "hoisted_load");
+
+ // Replace all uses of load with new load.
+ RHS->replaceAllUsesWith(NewLoad);
+ dyn_cast<LoadInst>(RHS)->eraseFromParent();
+
+ // Replace second operand of comparison with KnownMinPhi.
+ Comparison->setOperand(1, KnownMinPhi);
+
+ // Create new select instruction for selecting the minimum value.
+ IRBuilder<> SelectBuilder(BB->getTerminator());
+ SelectInst *CurrentMinSelect =
+ dyn_cast<SelectInst>(SelectBuilder.CreateSelect(
+ Comparison, LHS, KnownMinPhi, "current_min"));
+
+ // Populate PHI node.
+ KnownMinPhi->addIncoming(NewLoad, Preheader);
+ KnownMinPhi->addIncoming(CurrentMinSelect, BB);
+ LLVM_DEBUG(dbgs() << "Transformed the code\n");
+ return true;
+ } else {
+ LLVM_DEBUG(dbgs() << "GVN: Could not find pattern: " << *RHS << "\n");
+ return false;
+ }
+ return false;
+}
+
+
class llvm::gvn::GVNLegacyPass : public FunctionPass {
public:
static char ID; // Pass identification, replacement for typeid.
diff --git a/llvm/test/Transforms/GVN/PRE/rnflow-gvn-pre.ll b/llvm/test/Transforms/GVN/PRE/rnflow-gvn-pre.ll
new file mode 100644
index 0000000000000..6f17d4ab30240
--- /dev/null
+++ b/llvm/test/Transforms/GVN/PRE/rnflow-gvn-pre.ll
@@ -0,0 +1,59 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 6
+; Minimal test case containing only the .lr.ph.i basic block
+; RUN: opt -passes=gvn -S < %s | FileCheck %s
+
+define void @test_lr_ph_i(ptr %0) {
+; CHECK-LABEL: define void @test_lr_ph_i(
+; CHECK-SAME: ptr [[TMP0:%.*]]) {
+; CHECK-NEXT: [[ENTRY:.*]]:
+; CHECK-NEXT: [[HOIST_GEP1:%.*]] = getelementptr float, ptr [[TMP0]], i64 1
+; CHECK-NEXT: [[HOIST_GEP2:%.*]] = getelementptr i8, ptr [[HOIST_GEP1]], i64 -4
+; CHECK-NEXT: [[HOISTED_LOAD:%.*]] = load float, ptr [[HOIST_GEP2]], align 4
+; CHECK-NEXT: br label %[[DOTLR_PH_I:.*]]
+; CHECK: [[_LR_PH_I:.*:]]
+; CHECK-NEXT: [[KNOWN_MIN:%.*]] = phi float [ [[HOISTED_LOAD]], %[[ENTRY]] ], [ [[CURRENT_MIN:%.*]], %[[DOTLR_PH_I]] ]
+; CHECK-NEXT: [[INDVARS_IV_I:%.*]] = phi i64 [ 1, %[[ENTRY]] ], [ [[INDVARS_IV_NEXT_I:%.*]], %[[DOTLR_PH_I]] ]
+; CHECK-NEXT: [[TMP1:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ [[TMP10:%.*]], %[[DOTLR_PH_I]] ]
+; CHECK-NEXT: [[DOT05_I:%.*]] = phi i32 [ 1, %[[ENTRY]] ], [ [[DOT1_I:%.*]], %[[DOTLR_PH_I]] ]
+; CHECK-NEXT: [[INDVARS_IV_NEXT_I]] = add nsw i64 [[INDVARS_IV_I]], -1
+; CHECK-NEXT: [[TMP2:%.*]] = getelementptr float, ptr [[TMP0]], i64 [[INDVARS_IV_I]]
+; CHECK-NEXT: [[TMP3:%.*]] = getelementptr i8, ptr [[TMP2]], i64 -8
+; CHECK-NEXT: [[TMP4:%.*]] = load float, ptr [[TMP3]], align 4
+; CHECK-NEXT: [[TMP5:%.*]] = sext i32 [[DOT05_I]] to i64
+; CHECK-NEXT: [[TMP6:%.*]] = getelementptr float, ptr [[TMP0]], i64 [[TMP5]]
+; CHECK-NEXT: [[TMP7:%.*]] = getelementptr i8, ptr [[TMP6]], i64 -4
+; CHECK-NEXT: [[TMP8:%.*]] = fcmp contract olt float [[TMP4]], [[KNOWN_MIN]]
+; CHECK-NEXT: [[TMP9:%.*]] = trunc nsw i64 [[INDVARS_IV_NEXT_I]] to i32
+; CHECK-NEXT: [[DOT1_I]] = select i1 [[TMP8]], i32 [[TMP9]], i32 [[DOT05_I]]
+; CHECK-NEXT: [[TMP10]] = add nsw i64 [[TMP1]], -1
+; CHECK-NEXT: [[TMP11:%.*]] = icmp samesign ugt i64 [[TMP1]], 1
+; CHECK-NEXT: [[CURRENT_MIN]] = select i1 [[TMP8]], float [[TMP4]], float [[KNOWN_MIN]]
+; CHECK-NEXT: br i1 [[TMP11]], label %[[DOTLR_PH_I]], label %[[EXIT:.*]]
+; CHECK: [[EXIT]]:
+; CHECK-NEXT: ret void
+;
+entry:
+ br label %.lr.ph.i
+
+.lr.ph.i: ; preds = %.lr.ph.i, %entry
+ %indvars.iv.i = phi i64 [ 1, %entry ], [ %indvars.iv.next.i, %.lr.ph.i ]
+ %86 = phi i64 [ 0, %entry ], [ %96, %.lr.ph.i ]
+ %.05.i = phi i32 [ 1, %entry ], [ %.1.i, %.lr.ph.i ]
+ %indvars.iv.next.i = add nsw i64 %indvars.iv.i, -1
+ %87 = getelementptr float, ptr %0, i64 %indvars.iv.i
+ %88 = getelementptr i8, ptr %87, i64 -8 ; first load : %0 + 4 * 1 - 8
+ %89 = load float, ptr %88, align 4
+ %90 = sext i32 %.05.i to i64
+ %91 = getelementptr float, ptr %0, i64 %90 ; %0 + 4 * 1
+ %92 = getelementptr i8, ptr %91, i64 -4 ; second load : %0 + 4 * 1 - 4
+ %93 = load float, ptr %92, align 4
+ %94 = fcmp contract olt float %89, %93
+ %95 = trunc nsw i64 %indvars.iv.next.i to i32
+ %.1.i = select i1 %94, i32 %95, i32 %.05.i
+ %96 = add nsw i64 %86, -1
+ %97 = icmp samesign ugt i64 %86, 1
+ br i1 %97, label %.lr.ph.i, label %exit
+
+exit:
+ ret void
+}
|
|
Hi @nikic and @david-arm Either this due to recent patches in LLVM or refreshed setup at my end. It is complicated to pinpoint to the patch which might have helped this patch gain the perf. Nevertheless, as it stands today, I see 15% end to end perf gain in Polyhedron's rnflow app on AArch64 Grace platform. My colleague, @ssijaric-nv also checked it on X86, AMD EPYC Milan1 and he sees slightly less gain but indeed there is roughtly 6% gain. I have cleaned up this patch and is ready for review. Can you please have a look? |
|
FYI nikic is currently out of the office and will come back at the start of December. |
|
Hi @nikic, If you're back, can you please have a look? |
| #include "llvm/IR/Dominators.h" | ||
| #include "llvm/IR/InstrTypes.h" | ||
| #include "llvm/IR/PassManager.h" | ||
| #include "llvm/Analysis/LoopInfo.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can forward declare Loop instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
llvm/lib/Transforms/Scalar/GVN.cpp
Outdated
| Value *BasePtr = nullptr, *IndexVal = nullptr, *OffsetVal = nullptr; | ||
| LLVM_DEBUG( | ||
| dbgs() | ||
| << "GVN: Analyzing select instruction for minimum finding pattern.\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd omit this kind of debug print that is going to be emitted for every select instruction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, removed this one. Please let me know if you find issue with debug statements too. Generally, I find them useful for debugging.
| // Check if this is less-than comparison. | ||
| CmpInst::Predicate Pred = Comparison->getPredicate(); | ||
| if (Pred != CmpInst::ICMP_SLT && Pred != CmpInst::ICMP_ULT && | ||
| Pred != CmpInst::FCMP_OLT && Pred != CmpInst::FCMP_ULT) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means that you will fail to handle the pattern where x > y ? b : a is used instead of x < y ? a : b. It's better to use swapped predicate in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure, if I got this. Do you mean I should swap, get Swapped predicate and then go ahead?
llvm/lib/Transforms/Scalar/GVN.cpp
Outdated
| // %min.idx.ext = sext i32 %min.idx to i64 | ||
| // %ptr.float.min = getelementptr float, ptr %0, i64 %min.idx.ext | ||
| // %ptr.second.load = getelementptr i8, ptr %ptr.float.min, i64 -4 | ||
| // %val.current.min = load float, ptr %ptr.second.load, align 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great if there was a comment with the full before + after pattern before the implementation code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| // Hoist the load and build the necessary operations. | ||
| // 1. hoist_0 = sext i32 to i64 | ||
| Value *HoistedSExt = | ||
| Builder.CreateSExt(InitialMinIndex, Builder.getInt64Ty(), "hoist_sext"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sext i64 seems to make assumptions about the types that are involved without ever checking them. I expect assertion failures if you use different types. Ideally this should be type independent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, added checks.
|
|
||
| // 2. hoist_gep1 = getelementptr float, ptr BasePtr, i64 HoistedSExt | ||
| Value *HoistedGEP1 = | ||
| Builder.CreateGEP(LoadType, BasePtr, HoistedSExt, "hoist_gep1"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to make an assumption that the load type and the GEP type are the same, which is not necessarily true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm..Capturing GEP in m_match is bit tricky here. How do I add such check?
| OffsetVal, "hoist_gep2"); | ||
|
|
||
| // 4. hoisted_load = load float, ptr HoistedGEP2 | ||
| LoadInst *NewLoad = Builder.CreateLoad(LoadType, HoistedGEP2, "hoisted_load"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to hoist loads, shouldn't we be querying MDA about clobbers somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And we probably also need to update MSSA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Added the loop for checking clobbering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a lot more test coverage for the various pre-conditions of the transform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, added a lot more "negative" tests.
1. Added 10+ negative tests. 2. Added positive tests for non-float types. 3. Strengthning checks in recognize function 4. clang-format changes 5. Addressed other review comments
|
Gentile ping @nikic! |
This is an alternative patch to #144987, adding support in GVN.
In this patch, I added support in GVN to recognize PRE load and hoist it out of the loop. This patch leads to ~15% improvement for the rnflow benchmark in Polyhedron testsuite when tested on NVIDIA's Grace platform.