Currently, softmax is only implemented on flat tensors and is thus only useful as the last layer of a classification model.
In the future, the softmax operation should allow multi-dimensional input and choosing an arbitrary axis along which the operation is applied.