sort:Improve fast lexicographic path for sort default mode #9272

mattsu2020 · 2025-11-14T11:34:45Z

Summary

move the fast lexicographic check to the top of compare_by so the ASCII fast path is shared and still respects --reverse
route SortMode::Default through custom_str_cmp to ensure --ignore-non-printing, --dictionary-order, and --ignore-case remain effective even when the fast path triggers
drop the duplicated reverse-comparison branch to keep the default mode logic in one place

…U sort Adjust the compare_by function to swap the order of byte comparison (b.line vs a.line) specifically in SortMode::Default, ensuring raw byte sorting behavior aligns with GNU sort. This fixes inconsistencies in default sorting order for raw bytes, improving compatibility with expected POSIX behavior.

- Swapped lexicographic comparison from a.line.cmp(b.line) to b.line.cmp(a.line) to correct ordering behavior in compare_by function. - Simplified SortMode::Default from custom_str_cmp with options to plain b_str.cmp(a_str), assuming default sort now uses standard string comparison without special handling for non-printing, dictionary, or case sensitivity. This ensures consistent and correct sorting behavior in the sort utility.

When both lines are valid UTF-8, compare them lexicographically for proper text sorting; otherwise fall back to reversed byte comparison to match GNU sort behavior for raw bytes. Improves sorting accuracy for text data while preserving byte-level sorting for binary input.

- Move fast lexicographic comparison to early check and remove duplication - Replace simple reverse comparison in SortMode::Default with custom_str_cmp function - Ensures proper handling of ignore_non_printing, dictionary_order, and ignore_case options

codspeed-hq · 2025-11-14T11:50:01Z

CodSpeed Performance Report

Merging #9272 will degrade performances by 33.72%

_{Comparing mattsu2020:sort_fix (da74869) with main (6e99389)¹}

Summary

⚡ 1 improvement
❌ 19 regressions
✅ 104 untouched
⏩ 5 skipped²

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

	Benchmark	`BASE`	`HEAD`	Change
❌	`du_summarize_balanced_tree[(5, 4, 10)]`	8.3 ms	8.5 ms	-2.02%
⚡	`factor_table`	195.3 ms	184.9 ms	+5.66%
❌	`sort_accented_data[500000]`	362.5 ms	508.9 ms	-28.78%
❌	`sort_ascii_only[500000]`	353.6 ms	466.8 ms	-24.25%
❌	`sort_case_insensitive[500000]`	278.5 ms	292.9 ms	-4.9%
❌	`sort_case_sensitive[500000]`	174.2 ms	262.8 ms	-33.72%
❌	`sort_key_field[500000]`	700.1 ms	725.9 ms	-3.55%
❌	`sort_long_line[160000]`	1.6 ms	1.8 ms	-10.18%
❌	`sort_mixed_data[500000]`	327.3 ms	465.3 ms	-29.67%
❌	`sort_reverse_locale[500000]`	362 ms	511.3 ms	-29.19%
❌	`sort_unique_locale[500000]`	494 ms	658.7 ms	-25%
❌	`sort_ascii_c_locale`	21.5 ms	30.2 ms	-28.78%
❌	`sort_ascii_utf8_locale`	43.1 ms	60.3 ms	-28.66%
❌	`sort_german_c_locale`	38.4 ms	50 ms	-23.26%
❌	`sort_german_locale`	39.1 ms	50.2 ms	-22.09%
❌	`sort_mixed_c_locale`	38.3 ms	49.5 ms	-22.76%
❌	`sort_mixed_utf8_locale`	38.8 ms	48.1 ms	-19.35%
❌	`sort_numeric`	23.2 ms	24.8 ms	-6.52%
❌	`sort_reverse_mixed`	39.1 ms	49.8 ms	-21.45%
❌	`sort_unique_mixed`	39.3 ms	49.4 ms	-20.51%

No successful run was found on main (1c50306) during the generation of this report, so 6e99389 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report. ↩
5 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

github-actions · 2025-11-14T11:55:45Z

GNU testsuite comparison:

Skip an intermittent issue tests/tail/overlay-headers (fails in this run but passes in the 'main' branch)

- Added token_buffer Vec<Field> to ChunkContents and RecycledChunk - Added utf8_cache Vec<Option<&'a str>> to LineData and RecycledChunk - Updated recycle, read, and parse_lines handling for new fields - Enables efficient tokenization and UTF-8 caching in sort operations

Use clamp() instead of chained max().min() for better readability. This replaces explicit bounds checking in merge_chunk_capacity() with the more idiomatic clamp method, achieving the same effect while reducing code verbosity.

github-actions · 2025-11-14T13:46:10Z

GNU testsuite comparison:

Skipping an intermittent issue tests/tail/overlay-headers (passes in this run but fails in the 'main' branch)

sylvestre

it regresses all benchmarks

Implement dynamic pipeline depth to allow more overlapping between reading, sorting, and writing operations. The `pipeline_depth` setting now controls the number of in-flight chunks, with a minimum of 2, improving performance for large datasets by reducing I/O bottlenecks.

github-actions · 2025-11-16T11:06:44Z

GNU testsuite comparison:

Skipping an intermittent issue tests/tail/overlay-headers (passes in this run but fails in the 'main' branch)

…U sort Adjust the compare_by function to swap the order of byte comparison (b.line vs a.line) specifically in SortMode::Default, ensuring raw byte sorting behavior aligns with GNU sort. This fixes inconsistencies in default sorting order for raw bytes, improving compatibility with expected POSIX behavior.

- Swapped lexicographic comparison from a.line.cmp(b.line) to b.line.cmp(a.line) to correct ordering behavior in compare_by function. - Simplified SortMode::Default from custom_str_cmp with options to plain b_str.cmp(a_str), assuming default sort now uses standard string comparison without special handling for non-printing, dictionary, or case sensitivity. This ensures consistent and correct sorting behavior in the sort utility.

When both lines are valid UTF-8, compare them lexicographically for proper text sorting; otherwise fall back to reversed byte comparison to match GNU sort behavior for raw bytes. Improves sorting accuracy for text data while preserving byte-level sorting for binary input.

- Move fast lexicographic comparison to early check and remove duplication - Replace simple reverse comparison in SortMode::Default with custom_str_cmp function - Ensures proper handling of ignore_non_printing, dictionary_order, and ignore_case options

- Added token_buffer Vec<Field> to ChunkContents and RecycledChunk - Added utf8_cache Vec<Option<&'a str>> to LineData and RecycledChunk - Updated recycle, read, and parse_lines handling for new fields - Enables efficient tokenization and UTF-8 caching in sort operations

Use clamp() instead of chained max().min() for better readability. This replaces explicit bounds checking in merge_chunk_capacity() with the more idiomatic clamp method, achieving the same effect while reducing code verbosity.

Implement dynamic pipeline depth to allow more overlapping between reading, sorting, and writing operations. The `pipeline_depth` setting now controls the number of in-flight chunks, with a minimum of 2, improving performance for large datasets by reducing I/O bottlenecks.

github-actions · 2025-11-16T11:42:59Z

GNU testsuite comparison:

Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/tail/overlay-headers (passes in this run but fails in the 'main' branch)

…to sort_fix

Add a new `filtered_lines` field to `LineData` for caching processed line versions. Introduce `build_filtered_line` function to filter out non-printing or non-dictionary characters and handle case-ignore, enabling new sorting options. Update chunk recycling and parsing to manage the new field. Add buffer size normalization and pipeline depth tuning for improved memory efficiency in external sorting.

Introduce `ReaderWriterConfig` struct to bundle `buffer_size` and `pipeline_depth` parameters, reducing function signature complexity and improving code readability in external sorting logic. Replace separate parameters with the config reference in `ext_sort`, `reader_writer`, and `read_write_loop` functions. Minor efficiency tweak in `tuned_pipeline_depth` using `.clamp()` for bounds checking.

github-actions · 2025-11-16T14:39:11Z

GNU testsuite comparison:

Skipping an intermittent issue tests/tail/overlay-headers (passes in this run but fails in the 'main' branch)

mattsu2020 · 2025-11-17T03:20:25Z

I'm making changes to this pull request for performance tuning, but regressions always occur.
#9306

mattsu2020 and others added 6 commits November 14, 2025 12:58

Merge branch 'uutils:main' into sort_fix

139ceb3

fix format

0ee0e71

mattsu2020 marked this pull request as draft November 14, 2025 13:11

mattsu2020 added 2 commits November 14, 2025 22:18

refactor(sort): simplify merge_chunk_capacity calculation

0020eaf

Use clamp() instead of chained max().min() for better readability. This replaces explicit bounds checking in merge_chunk_capacity() with the more idiomatic clamp method, achieving the same effect while reducing code verbosity.

mattsu2020 marked this pull request as ready for review November 14, 2025 14:13

sylvestre requested changes Nov 15, 2025

View reviewed changes

mattsu2020 added 8 commits November 16, 2025 12:23

fix format

4e280d3

refactor(sort): simplify merge_chunk_capacity calculation

8e150f1

Use clamp() instead of chained max().min() for better readability. This replaces explicit bounds checking in merge_chunk_capacity() with the more idiomatic clamp method, achieving the same effect while reducing code verbosity.

sylvestre force-pushed the sort_fix branch from 9911a1f to c415939 Compare November 16, 2025 11:23

mattsu2020 added 3 commits November 16, 2025 22:44

Merge branch 'sort_fix' of https://github.com/mattsu2020/coreutils in…

31a949f

…to sort_fix

mattsu2020 marked this pull request as draft November 17, 2025 03:20

mattsu2020 closed this Dec 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

sort:Improve fast lexicographic path for sort default mode #9272

sort:Improve fast lexicographic path for sort default mode #9272

mattsu2020 commented Nov 14, 2025

Uh oh!

codspeed-hq bot commented Nov 14, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 14, 2025

Uh oh!

github-actions bot commented Nov 14, 2025

Uh oh!

sylvestre left a comment

Uh oh!

github-actions bot commented Nov 16, 2025

Uh oh!

github-actions bot commented Nov 16, 2025

Uh oh!

github-actions bot commented Nov 16, 2025

Uh oh!

mattsu2020 commented Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

sort:Improve fast lexicographic path for sort default mode #9272

sort:Improve fast lexicographic path for sort default mode #9272

Conversation

mattsu2020 commented Nov 14, 2025

Summary

Uh oh!

codspeed-hq bot commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging #9272 will degrade performances by 33.72%

Summary

Benchmarks breakdown

Footnotes

Uh oh!

github-actions bot commented Nov 14, 2025

Uh oh!

github-actions bot commented Nov 14, 2025

Uh oh!

sylvestre left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 16, 2025

Uh oh!

github-actions bot commented Nov 16, 2025

Uh oh!

github-actions bot commented Nov 16, 2025

Uh oh!

mattsu2020 commented Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codspeed-hq bot commented Nov 14, 2025 •

edited

Loading