GPU pairwise mergesort using a bank conflict free merging as described in [1]. Experiments on a NVIDIA RTX 2080 Ti show that CF-Merge eliminates the slowdowns due to bank conflicts.
Experimental setup:
- NVIDIA RTX 2080 Ti
- CUDA 11
- Thrust v1.9.9
- Worst-case inputs can be downloaded here
test/sort_int_random.cu- Test harness for random inputs
Command line arguments:- Total number of warps (positive power of 2 required)
- RNG seed value
test/sort_int_worst.cu- Test harness for constructed inputs
Command line arguments:- Total number of warps (positive power of 2 required)
- Path of the directory containing binary files for the worst-case constructed inputs
sort.h- Modified Thrust code using CF-MergeMakefile- Makefile for compiling test harness programs
- Overwrite the default
sort.hfile in Thrust located inthrust-1.9.9/thrust/system/cuda/detail/
cp sort.h thrust-1.9.9/thrust/system/cuda/detail/sort.h- Compile test harness programs
make- Run test harness programs
./sort_int_random_15.out <total number of warps (positive power of 2 required)> <RNG seed value>
./sort_int_worst_15.out <total number of warps (positive power of 2 required)> <directory filepath>
./sort_int_random_17.out <total number of warps (positive power of 2 required)> <RNG seed value>
./sort_int_worst_17.out <total number of warps (positive power of 2 required)> <directory filepath>[1] Kyle Berney and Nodari Sitchinava. "Eliminating bank conflicts in GPU mergesort". In Proceedings of the 37th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), 2025. To appear. https://doi.org/10.1145/3694906.3743337