-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
sort:Improve fast lexicographic path for sort default mode #9272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…U sort Adjust the compare_by function to swap the order of byte comparison (b.line vs a.line) specifically in SortMode::Default, ensuring raw byte sorting behavior aligns with GNU sort. This fixes inconsistencies in default sorting order for raw bytes, improving compatibility with expected POSIX behavior.
- Swapped lexicographic comparison from a.line.cmp(b.line) to b.line.cmp(a.line) to correct ordering behavior in compare_by function. - Simplified SortMode::Default from custom_str_cmp with options to plain b_str.cmp(a_str), assuming default sort now uses standard string comparison without special handling for non-printing, dictionary, or case sensitivity. This ensures consistent and correct sorting behavior in the sort utility.
When both lines are valid UTF-8, compare them lexicographically for proper text sorting; otherwise fall back to reversed byte comparison to match GNU sort behavior for raw bytes. Improves sorting accuracy for text data while preserving byte-level sorting for binary input.
- Move fast lexicographic comparison to early check and remove duplication - Replace simple reverse comparison in SortMode::Default with custom_str_cmp function - Ensures proper handling of ignore_non_printing, dictionary_order, and ignore_case options
CodSpeed Performance ReportMerging #9272 will degrade performances by 33.72%Comparing Summary
Benchmarks breakdown
Footnotes
|
|
GNU testsuite comparison: |
- Added token_buffer Vec<Field> to ChunkContents and RecycledChunk - Added utf8_cache Vec<Option<&'a str>> to LineData and RecycledChunk - Updated recycle, read, and parse_lines handling for new fields - Enables efficient tokenization and UTF-8 caching in sort operations
Use clamp() instead of chained max().min() for better readability. This replaces explicit bounds checking in merge_chunk_capacity() with the more idiomatic clamp method, achieving the same effect while reducing code verbosity.
|
GNU testsuite comparison: |
sylvestre
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it regresses all benchmarks
Implement dynamic pipeline depth to allow more overlapping between reading, sorting, and writing operations. The `pipeline_depth` setting now controls the number of in-flight chunks, with a minimum of 2, improving performance for large datasets by reducing I/O bottlenecks.
|
GNU testsuite comparison: |
…U sort Adjust the compare_by function to swap the order of byte comparison (b.line vs a.line) specifically in SortMode::Default, ensuring raw byte sorting behavior aligns with GNU sort. This fixes inconsistencies in default sorting order for raw bytes, improving compatibility with expected POSIX behavior.
- Swapped lexicographic comparison from a.line.cmp(b.line) to b.line.cmp(a.line) to correct ordering behavior in compare_by function. - Simplified SortMode::Default from custom_str_cmp with options to plain b_str.cmp(a_str), assuming default sort now uses standard string comparison without special handling for non-printing, dictionary, or case sensitivity. This ensures consistent and correct sorting behavior in the sort utility.
When both lines are valid UTF-8, compare them lexicographically for proper text sorting; otherwise fall back to reversed byte comparison to match GNU sort behavior for raw bytes. Improves sorting accuracy for text data while preserving byte-level sorting for binary input.
- Move fast lexicographic comparison to early check and remove duplication - Replace simple reverse comparison in SortMode::Default with custom_str_cmp function - Ensures proper handling of ignore_non_printing, dictionary_order, and ignore_case options
- Added token_buffer Vec<Field> to ChunkContents and RecycledChunk - Added utf8_cache Vec<Option<&'a str>> to LineData and RecycledChunk - Updated recycle, read, and parse_lines handling for new fields - Enables efficient tokenization and UTF-8 caching in sort operations
Use clamp() instead of chained max().min() for better readability. This replaces explicit bounds checking in merge_chunk_capacity() with the more idiomatic clamp method, achieving the same effect while reducing code verbosity.
Implement dynamic pipeline depth to allow more overlapping between reading, sorting, and writing operations. The `pipeline_depth` setting now controls the number of in-flight chunks, with a minimum of 2, improving performance for large datasets by reducing I/O bottlenecks.
|
GNU testsuite comparison: |
Add a new `filtered_lines` field to `LineData` for caching processed line versions. Introduce `build_filtered_line` function to filter out non-printing or non-dictionary characters and handle case-ignore, enabling new sorting options. Update chunk recycling and parsing to manage the new field. Add buffer size normalization and pipeline depth tuning for improved memory efficiency in external sorting.
Introduce `ReaderWriterConfig` struct to bundle `buffer_size` and `pipeline_depth` parameters, reducing function signature complexity and improving code readability in external sorting logic. Replace separate parameters with the config reference in `ext_sort`, `reader_writer`, and `read_write_loop` functions. Minor efficiency tweak in `tuned_pipeline_depth` using `.clamp()` for bounds checking.
|
GNU testsuite comparison: |
|
I'm making changes to this pull request for performance tuning, but regressions always occur. |
Summary
compare_byso the ASCII fast path is shared and still respects--reverseSortMode::Defaultthroughcustom_str_cmpto ensure--ignore-non-printing,--dictionary-order, and--ignore-caseremain effective even when the fast path triggersfix
#9264