In my opinion, authors define L_{vlb} = L_0 + ... + L_T, not L_t.

              In my opinion, authors define L_{vlb} = L_0 + ... + L_T, not L_t.
Thus, they may calculate the vlb loss with scale factor T (self.num_timestep).

_Originally posted by @yhy258 in https://github.com/openai/improved-diffusion/issues/114#issuecomment-2261727529_