Parallel in-place permutations between sorted order and the following implicit search tree layouts:
- Level-order Binary Search Tree (BST)
- Level-order B-tree (B-tree)
- van Emde Boas (vEB) and Modified van Emde Boas (mvEB)
This library provides implementations for both the CPU (C++ with OpenMP) and the GPU (CUDA C++).
The algorithms implemented are described in the paper:
K. Berney, H. Casanova, A. Higuchi, B. Karsin, N. Sitchinava. "Beyond binary search: parallel in-place construction of implicit search tree layouts". In Proceedings of the 32nd International Parallel and Distributed Processing Symposium (IPDPS), pages 1070-1079, 2018.
Additional details are available in:
K. Berney. "Beyond binary search: parallel in-place construction of implicit search tree layouts" (Master's Thesis). University of Hawaii at Manoa, 2018.
To compile all sample programs, type make (while in either the cpu or gpu directory).
In order to use any of the search tree layout permutations, include the corresponding header file (see the Organization section for a description of file names) and call its appropriate permutation function.
CPU:
- BST:
timePermuteBST<TYPE>(A, n, p) - B-tree:
timePermuteBtree<TYPE>(A, n, b, p) - vEB:
timePermutevEB<TYPE>(A, n, p)
GPU:
- BST:
timePermuteBST<TYPE>(dev_A, n) - B-tree:
timePermuteBtree<TYPE>(dev_A, n, b) - mvEB:
timePermutevEB<TYPE>(dev_A, n)
To perform a set of queries on any of the resulting implicit search tree layouts, additionally include the corresponding query header file (see the Organization section for a description of file names) and call its appropriate query function.
CPU:
- BST:
timeQueryBST<TYPE>(A, n, q, p)for both perfect and non-perfect tree layouts - B-tree:
timeQueryBtree<TYPE>(A, n, b, q, p)for both perfect and non-perfect tree layouts - vEB:
timeQueryvEB<TYPE>(A, table, n, d, q, p)for perfect tree layouts andtimeQueryvEB_nonperfect<TYPE>(A, tables, idx, num_tables, n, d, q, p)for non-perfect tree layouts
GPU:
- BST:
timeQueryBST<TYPE>(A, dev_A, n, q)for both perfect and non-perfect tree layouts - B-tree:
timeQueryBtree<TYPE>(A, dev_A, n, b, q)for both perfect and non-perfect tree layouts - mvEB:
timeQueryvEB<TYPE>(A, dev_A, dev_table, n, d, q)for perfect tree layouts andtimeQueryvEB_nonperfect<TYPE>(A, dev_A, dev_tables, dev_idx, dev_table_idx, n, d, num_tables, q);for non-perfect tree layouts
The parameters of the above functions are defined as follows:
TYPEis the data type of the elementsAis a pointer to the start of the array of elements residing in CPU RAM spacedev_Ais a pointer to the start of the array of elements residing in GPU RAM spacenis the number of elements in the arraypis the number of CPU threads to usedis the number of levels in the corresponding search tree layouttableis a pointer to the start of the data structure used to query the vEB tree layout (residing in CPU RAM space)dev_tableis a pointer to the start of the data structure used to query the mvEB tree layout (residing in GPU RAM space)num_tablesis the number ofvEB_tabledata structures needed to query a non-perfect vEB or mvEB tree layouttablesis a pointer to the start of an array of sizenum_tables(residing in CPU RAM space), where each array element is a pointer to avEB_tabledata structuredev_tablesis a pointer to the start of the data structure wherenum_tablesvEB_tabledata structures are sequentially stored (residing in GPU RAM space)idxis a pointer to the start of an array of sizenum_tables(residing in CPU RAM space), where each array element is an integer corresponding to the index where the last full bottom tree of the i-th vEB recursive division endsdev_idxis a pointer to the start of an array of sizenum_tables(residing in GPU RAM space), where each array element is an integer corresponding to the index where the last full bottom tree of the i-th mvEB recursive division endsdev_table_idxis a pointer to the start of an array of sizenum_tables(residing in GPU RAM space), where each array element is an integer corresponding to the index where the i-thvEB_tablestarts
To initialize the table used to query the vEB and mvEB tree layouts, execute the following code:
vEB_table *table = (vEB_table *)calloc(d, sizeof(vEB_table));
buildTable(table, n, d, 0);
For more information about this data structure, see the following paper: G.S. Brodal, R. Fagerberg, and R. Jacob. "Cache oblivious search trees via binary trees of small height". In Proceedings of the 13th ACM-SIAM Symposium on Discrete Algorithms, pages 39-48, 2002.
To change the number of GPU threads launched and its organization into thread-blocks, change the parameters defined in gpu/src/params.cuh.
BLOCKSis the number of thread-blocks to launchTHREADSis the number of threads per thread-block
The params.h and params.cuh both contain a #define DEBUG and #define VERIFY statements, which should normally be commented out.
However, if the user wants to turn on debug or query verification mode, uncomment out this statement and recompile.
This will turn on various debug statements and query verification checks throughout execution.
The cpu and gpu directories each contain the necessary files needed to compile and run the library for the corresponding processing unit.
Both directories are organized similarly with a Makefile, a src folder, and a test folder.
For brevity, we describe the naming convention of files using the CPU file extensions (.cpp/.h) instead of the GPU file extensions (.cu/.cuh). However, the descriptions apply to both the CPU and GPU files.
The src directory contains all the necessary code (.cpp/.h files for the CPU and .cu/.cuh files for the GPU) to perform all the various implicit search tree permutations.
src/params.hcontains all general compiler defined constants.src/common.hcontains all general helper algorithms.src/cycles.himplements all cycle-leader helper algorithms.src/involutions.handsrc/involutions.cppimplements all involution helper algorithms.src/<search tree layout abbreviation>-cycles.himplements the permutation for the corresponding search tree layout using the cycle-leader approach.src/<search tree layout abbreviation>-involutions.himplements the permutation for the corresponding search tree layout using the involutions approach.src/query-<search tree layout abbreviation>.himplements the querying for the corresponding search tree layout.- Note that for the vEB and vEB tree layouts, a
src/query-<search tree layout abbreviation>.cppfile is also needed.
- Note that for the vEB and vEB tree layouts, a
The test directory contains all sample code for permuting and querying each of the implicit search tree layouts.
test/build-<search tree layout abbreviation>-cycles.cppperforms the permutation for the corresponding search tree layout using the cycle-leader approach.test/build-<search tree layout abbreviation>-involutions.cppperforms the permutation for the corresponding search tree layout using the involutions approach.test/query-<search tree layout abbreviation>-fixedN.cppqueries the corresponding search tree layout, while varying Q and keeping N fixed.test/query-<search tree layout abbreviation>-fixedQ.cppqueries the corresponding search tree layout, while varying N and keeping Q fixed.
- Asynchronous execution of GPU kernels can possibly be investigated and implemented.