-
Notifications
You must be signed in to change notification settings - Fork 3
Description
First of all, thank you for sharing your great work! I believe the idea behind your patch-based diffusion model has significant potential for high-resolution medical image generation. Before applying your research to my own data, I decided to quickly reproduce the results using the LSUN-church dataset that you used.
Following your pipeline description, I preprocessed the data, initialized with CLIP embeddings, and trained the model with a patch size of 64. After training for 1M iterations, I further trained the model with latent diffusion to generate images. I then calculated the FID score between 50K generated images and real images, and obtained an FID score of 11.4, which is higher than the FID of 5.49 reported in the paper.
I used the default settings as described in the README file. Could you provide any implementation details, tips, or tricks that might help reduce the FID score further?
Thank you for your help!