The calculation formula of the softsort operator is as follows:
$$Soft Sort _{\tau}^{d}(s)=softmax\left(\frac{-d\left(sort(s) \mathbb{1}^{T}, \mathbb{1} s^{T}\right)}{\tau}\right)$$
However, the $sort(s)$ operation is used in the formula. According to the description in the paper, isn’t $sort(s)$ non-differentiable? From the perspectives of mathematical principles and machine learning engineering implementation, how can this be explained?