🐛 Describe the bug
if module.padding_idx is not None:
module.weight.data[module.padding_idx].zero_()
When module.padding_idx exist, the weight of embedding will initial to 0 since the padding_idx is incorrectly index to the fusion_B dimension.
Versions / Dependencies
Pytorch 2.0