Skip to content

Conversation

@mattsu2020
Copy link
Contributor

Summary

  • move the fast lexicographic check to the top of compare_by so the ASCII fast path is shared and still respects --reverse
  • route SortMode::Default through custom_str_cmp to ensure --ignore-non-printing, --dictionary-order, and --ignore-case remain effective even when the fast path triggers
  • drop the duplicated reverse-comparison branch to keep the default mode logic in one place

fix
#9264

mattsu2020 and others added 6 commits November 14, 2025 12:58
…U sort

Adjust the compare_by function to swap the order of byte comparison (b.line vs a.line) specifically in SortMode::Default, ensuring raw byte sorting behavior aligns with GNU sort. This fixes inconsistencies in default sorting order for raw bytes, improving compatibility with expected POSIX behavior.
- Swapped lexicographic comparison from a.line.cmp(b.line) to b.line.cmp(a.line) to correct ordering behavior in compare_by function.
- Simplified SortMode::Default from custom_str_cmp with options to plain b_str.cmp(a_str), assuming default sort now uses standard string comparison without special handling for non-printing, dictionary, or case sensitivity.

This ensures consistent and correct sorting behavior in the sort utility.
When both lines are valid UTF-8, compare them lexicographically for proper text sorting; otherwise fall back to reversed byte comparison to match GNU sort behavior for raw bytes. Improves sorting accuracy for text data while preserving byte-level sorting for binary input.
- Move fast lexicographic comparison to early check and remove duplication
- Replace simple reverse comparison in SortMode::Default with custom_str_cmp function
- Ensures proper handling of ignore_non_printing, dictionary_order, and ignore_case options
@codspeed-hq
Copy link

codspeed-hq bot commented Nov 14, 2025

CodSpeed Performance Report

Merging #9272 will degrade performances by 33.72%

Comparing mattsu2020:sort_fix (da74869) with main (6e99389)1

Summary

⚡ 1 improvement
❌ 19 regressions
✅ 104 untouched
⏩ 5 skipped2

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark BASE HEAD Change
du_summarize_balanced_tree[(5, 4, 10)] 8.3 ms 8.5 ms -2.02%
factor_table 195.3 ms 184.9 ms +5.66%
sort_accented_data[500000] 362.5 ms 508.9 ms -28.78%
sort_ascii_only[500000] 353.6 ms 466.8 ms -24.25%
sort_case_insensitive[500000] 278.5 ms 292.9 ms -4.9%
sort_case_sensitive[500000] 174.2 ms 262.8 ms -33.72%
sort_key_field[500000] 700.1 ms 725.9 ms -3.55%
sort_long_line[160000] 1.6 ms 1.8 ms -10.18%
sort_mixed_data[500000] 327.3 ms 465.3 ms -29.67%
sort_reverse_locale[500000] 362 ms 511.3 ms -29.19%
sort_unique_locale[500000] 494 ms 658.7 ms -25%
sort_ascii_c_locale 21.5 ms 30.2 ms -28.78%
sort_ascii_utf8_locale 43.1 ms 60.3 ms -28.66%
sort_german_c_locale 38.4 ms 50 ms -23.26%
sort_german_locale 39.1 ms 50.2 ms -22.09%
sort_mixed_c_locale 38.3 ms 49.5 ms -22.76%
sort_mixed_utf8_locale 38.8 ms 48.1 ms -19.35%
sort_numeric 23.2 ms 24.8 ms -6.52%
sort_reverse_mixed 39.1 ms 49.8 ms -21.45%
sort_unique_mixed 39.3 ms 49.4 ms -20.51%

Footnotes

  1. No successful run was found on main (1c50306) during the generation of this report, so 6e99389 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

  2. 5 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@github-actions
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/tail/overlay-headers (fails in this run but passes in the 'main' branch)

@mattsu2020 mattsu2020 marked this pull request as draft November 14, 2025 13:11
- Added token_buffer Vec<Field> to ChunkContents and RecycledChunk
- Added utf8_cache Vec<Option<&'a str>> to LineData and RecycledChunk
- Updated recycle, read, and parse_lines handling for new fields
- Enables efficient tokenization and UTF-8 caching in sort operations
Use clamp() instead of chained max().min() for better readability.

This replaces explicit bounds checking in merge_chunk_capacity() with the more idiomatic clamp method, achieving the same effect while reducing code verbosity.
@github-actions
Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/tail/overlay-headers (passes in this run but fails in the 'main' branch)

@mattsu2020 mattsu2020 marked this pull request as ready for review November 14, 2025 14:13
Copy link
Contributor

@sylvestre sylvestre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it regresses all benchmarks

Implement dynamic pipeline depth to allow more overlapping between reading,
sorting, and writing operations. The `pipeline_depth` setting now controls the
number of in-flight chunks, with a minimum of 2, improving performance for large
datasets by reducing I/O bottlenecks.
@github-actions
Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/tail/overlay-headers (passes in this run but fails in the 'main' branch)

…U sort

Adjust the compare_by function to swap the order of byte comparison (b.line vs a.line) specifically in SortMode::Default, ensuring raw byte sorting behavior aligns with GNU sort. This fixes inconsistencies in default sorting order for raw bytes, improving compatibility with expected POSIX behavior.
- Swapped lexicographic comparison from a.line.cmp(b.line) to b.line.cmp(a.line) to correct ordering behavior in compare_by function.
- Simplified SortMode::Default from custom_str_cmp with options to plain b_str.cmp(a_str), assuming default sort now uses standard string comparison without special handling for non-printing, dictionary, or case sensitivity.

This ensures consistent and correct sorting behavior in the sort utility.
When both lines are valid UTF-8, compare them lexicographically for proper text sorting; otherwise fall back to reversed byte comparison to match GNU sort behavior for raw bytes. Improves sorting accuracy for text data while preserving byte-level sorting for binary input.
- Move fast lexicographic comparison to early check and remove duplication
- Replace simple reverse comparison in SortMode::Default with custom_str_cmp function
- Ensures proper handling of ignore_non_printing, dictionary_order, and ignore_case options
- Added token_buffer Vec<Field> to ChunkContents and RecycledChunk
- Added utf8_cache Vec<Option<&'a str>> to LineData and RecycledChunk
- Updated recycle, read, and parse_lines handling for new fields
- Enables efficient tokenization and UTF-8 caching in sort operations
Use clamp() instead of chained max().min() for better readability.

This replaces explicit bounds checking in merge_chunk_capacity() with the more idiomatic clamp method, achieving the same effect while reducing code verbosity.
Implement dynamic pipeline depth to allow more overlapping between reading,
sorting, and writing operations. The `pipeline_depth` setting now controls the
number of in-flight chunks, with a minimum of 2, improving performance for large
datasets by reducing I/O bottlenecks.
@github-actions
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/tail/overlay-headers (passes in this run but fails in the 'main' branch)

Add a new `filtered_lines` field to `LineData` for caching processed line versions.
Introduce `build_filtered_line` function to filter out non-printing or non-dictionary
characters and handle case-ignore, enabling new sorting options. Update chunk
recycling and parsing to manage the new field. Add buffer size normalization and
pipeline depth tuning for improved memory efficiency in external sorting.
Introduce `ReaderWriterConfig` struct to bundle `buffer_size` and `pipeline_depth` parameters, reducing function signature complexity and improving code readability in external sorting logic. Replace separate parameters with the config reference in `ext_sort`, `reader_writer`, and `read_write_loop` functions. Minor efficiency tweak in `tuned_pipeline_depth` using `.clamp()` for bounds checking.
@github-actions
Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/tail/overlay-headers (passes in this run but fails in the 'main' branch)

@mattsu2020
Copy link
Contributor Author

I'm making changes to this pull request for performance tuning, but regressions always occur.
#9306

@mattsu2020 mattsu2020 marked this pull request as draft November 17, 2025 03:20
@mattsu2020 mattsu2020 closed this Dec 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants