Skip to content

algoparc/GPU-CFMerge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CF-Merge: Bank conflict free GPU mergesort

GPU pairwise mergesort using a bank conflict free merging as described in [1]. Experiments on a NVIDIA RTX 2080 Ti show that CF-Merge eliminates the slowdowns due to bank conflicts.

Throughput results (elements per microsecond) for Thrust and CF-Merge on a NVIDIA RTX 2080 Ti using the constructed worst-case inputs. Thrust results are in red and CF-Merge results are in blue. The short dashed lines represent software parameters 15 elements per thread (E) and 512 threads per block (u); and the long dashed lines represent software parameters 17 elements per thread (E) and 256 threads per block (u). The x-axis is displayed on a logarithmic scale.

Throughput results (elements per microsecond) for Thrust and CF-Merge on a NVIDIA RTX 2080 Ti using parameters 15 elements per thread (E) and 512 threads per block (u). Thrust results are in red and CF-Merge results are in blue. The dashed lines represent the constructed worst-case inputs and the dotted lines represent uniform random inputs. The x-axis is displayed on a logarithmic scale.

Throughput results (elements per microsecond) for Thrust and CF-Merge on a NVIDIA RTX 2080 Ti using parameters 17 elements per thread (E) and 256 threads per block (u). Thrust results are in red and CF-Merge results are in blue. The dashed lines represent the constructed worst-case inputs and the dotted lines represent uniform random inputs. The x-axis is displayed on a logarithmic scale.

Experimental setup:

  • NVIDIA RTX 2080 Ti
  • CUDA 11
  • Thrust v1.9.9
  • Worst-case inputs can be downloaded here

Files

  1. test/sort_int_random.cu - Test harness for random inputs
    Command line arguments:
    • Total number of warps (positive power of 2 required)
    • RNG seed value
  2. test/sort_int_worst.cu - Test harness for constructed inputs
    Command line arguments:
    • Total number of warps (positive power of 2 required)
    • Path of the directory containing binary files for the worst-case constructed inputs
  3. sort.h - Modified Thrust code using CF-Merge
  4. Makefile - Makefile for compiling test harness programs

Running CF-Merge

  1. Overwrite the default sort.h file in Thrust located in thrust-1.9.9/thrust/system/cuda/detail/
cp sort.h thrust-1.9.9/thrust/system/cuda/detail/sort.h
  1. Compile test harness programs
make
  1. Run test harness programs
./sort_int_random_15.out <total number of warps (positive power of 2 required)> <RNG seed value>
./sort_int_worst_15.out <total number of warps (positive power of 2 required)> <directory filepath>
./sort_int_random_17.out <total number of warps (positive power of 2 required)> <RNG seed value>
./sort_int_worst_17.out <total number of warps (positive power of 2 required)> <directory filepath>

References

[1] Kyle Berney and Nodari Sitchinava. "Eliminating bank conflicts in GPU mergesort". In Proceedings of the 37th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), 2025. To appear. https://doi.org/10.1145/3694906.3743337

About

Bank conflict free merging on GPUs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •