If we add 3 tensor and 2 of them are c contiguous but not the last, it seam wasteful to compute the index 2 times for each c contiguous array.