From what I understood, the only difference between mask proposal network in this repo is the number of classes predicted (binary vs N dataset classes) and N predictions can be converted to binary predictions. Are there any other difference? Just curious, have you done any ablation to verify if binary prediction is necessarily better?