Trying to speedup the data augmentation + transformation portions of SAM2 training by offloading them on the GPU.
The plan is to use the kornia library (pyTorch compatible) to do this. Eventually wondering if we can get to something like NVIDIA DALI to bypass the CPU entirely and speed up IO off disk to the GPU directly, for further speedups. But that is more of a stretch goal.