Add interleaving to sgemm and dgemm. Disentangle trmm/symm from gemm. #5573
+4,477
−47
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This change adds interleaving to sgemm and dgemm copies and kernels for ARMV8SVE.
This required a degree of disentangling symm and trmm kernels from gemm. It should now be much easier to apply further optimisations to gemm.
The addition of interleaving provides a ~1.4% speedup on c7g (V1), with negligible changes on c8g (V2).
Taken over square matrix operations with size 2->2014, stepsize = 1:
Geometric mean for interleave/c7g_dgemm.txt: 0.9859023206257058
Geometric mean for interleave/c7g_sgemm.txt: 0.9887890902680289
Geometric mean for interleave/c8g_dgemm.txt: 0.9970050554316875
Geometric mean for interleave/c8g_sgemm.txt: 0.9948135816755502
We see an increase in the sgemm speedup (~2.4%) on c7g for larger matrix sizes.
Taken over square matrix operations with size 2,000->10,000, stepsize = 1,000:
Geometric mean for 64thread_interleave/c7g_dgemm.txt: 0.9865252964543917
Geometric mean for 64thread_interleave/c7g_sgemm.txt: 0.9762227312411808
Geometric mean for 64thread_interleave/c8g_dgemm.txt: 0.9997186302044462
Geometric mean for 64thread_interleave/c8g_sgemm.txt: 0.9996022927667269