-
Notifications
You must be signed in to change notification settings - Fork 39
Open
Description
The spatial samplers are all currently done as single threaded function call on the CPU, there might be scope for accelerating the majority for many-core architectures given they basically all loop over a set of points.
Immediate issues:
- Memory transfer - the cell list is built for each data frame, so either that or the individual "data_points" subset found from the cell list would need to be transferred - this will be costly and hard to hide, might be scope for using CUDA MPI type approach.
- The calls to filter() usually work on small subsets (< 50 points) - these would need to be bundled up and run in parallel to make the most of a GPU.
- Ideally a solution should be hardware agnostic, so should be focused either on low-level like OpenCL or higher-level like SYCL type approach.