-
Notifications
You must be signed in to change notification settings - Fork 113
sve2 HISTSEG support for fast ondemand parsing on ARM #108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #108 +/- ##
==========================================
- Coverage 74.79% 73.40% -1.40%
==========================================
Files 21 22 +1
Lines 2436 2741 +305
Branches 667 749 +82
==========================================
+ Hits 1822 2012 +190
- Misses 297 430 +133
+ Partials 317 299 -18 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
#110 The CI bugs have been fixed,please rebase master |
|
@xiegx94 |
| #include "../common/arm_common/skip.inc.h" | ||
|
|
||
| // Requires clang vx or GCC>=14 | ||
| #if (defined(__clang__) && (__clang_major__ >= 14)) || \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__has_include would be better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed :)
| #if (defined(__clang__) && (__clang_major__ >= 14)) || \ | ||
| (defined(__GNUC__) && !defined(__clang__) && (__GNUC__ >= 14)) | ||
|
|
||
| #define USE_SVE_HIST 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check USE_SVE_HIST is not defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
|
@xiegx94 |
|
Hi @xiegx94 :) any chance you can look at it soon? |
This PR uses the SVE2 HISTSEG instruction to improve performance of ondemand parsing in ARM cpus
https://developer.arm.com/documentation/ddi0602/latest/SVE-Instructions/HISTSEG--Count-matching-elements-in-vector-segments-
Creating the bitmasks for each structural character is very expensive in ARM due to the lack of pmovmskb similar instructions. Histseg gives us the number of characters of interest in a single instruction with low latency and high throughput in neoverse v2 cpus.
This allows us to skip mask creation of characters that are not present and even to implement some fast path codes when we have only quotes for example. It is worth noting that x86 doesn't have an equivalent instruction and the avx2 VP2INTERSECT instruction doesn't support int8 datatypes.
Performance:
Master branch
This PR
This is faster than even the most recent x86 cpus which shows result around 24Gi/s for twitter.
Thanks to @supermartian !
This PR is contributed by nvidia.