Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about the gfid << rfid #46

Open
lxa9867 opened this issue Nov 5, 2024 · 3 comments
Open

Question about the gfid << rfid #46

lxa9867 opened this issue Nov 5, 2024 · 3 comments

Comments

@lxa9867
Copy link

lxa9867 commented Nov 5, 2024

Hi authors,

Thanks for sharing this interesting work.

I am curious about the relation between rfid and gfid presented by the recent works Maskbit and RAR. I noticed that the gfid can be significantly better compared to rfid (for example, rfig for RAR tokenizer is 2.28 while the gfid can be 1.48). Is there any explanation/discussion for this behavior? Thank you.

In addition, I noticed that the codebook size of RAR is set to 1024. Did you try to scale it up to a larger number?

@lxa9867 lxa9867 changed the title Question about the gfid >> rfid Question about the gfid << rfid Nov 5, 2024
@cornettoyu
Copy link
Collaborator

Thanks for your interests in our work.

Technically, rFID are gFID are computed in different ways. rFID is usually computed against ImageNet val set, while for gFID ppl usually follow ADM to compute against the virtual imagenet statistic. Thus the rFID and gFID are NOT directly comparable.

For the tokenizer, we directly borrow the off-the-shelf tokenizer from MaskGIT which comes at 1024 vocabulary size, we did not try to train our own tokenizer for RAR as the focus of this project is on the generator side so we would like to keep everything simpler. But I generally believe using a stronger tokenizer (e.g., larger vocabulary size as you suggested) may help RAR to get a better performance :)

@lxa9867
Copy link
Author

lxa9867 commented Nov 6, 2024

Thanks for your response.

I tried to (1) find the images in the reference batch for gfid evaluation, (2) reconstruct the images in the reference batch, and (3) evaluate the fid between the reconstructed reference batch and gt reference batch. I noticed that rfid seems still higher than gfid. I am not sure whether or not there is a technical flaw. Do you have any idea about this? Thanks.

I consider fid reflects the distribution difference. We have three distributions now, (1) recon, (2) gen, and (3) real. It seems that the difference between (2)-(3) can be smaller than (1)-(3). Since the cropping augmentation did not impose a distribution shift, I consider even the original rfid should also be able to reflect this relationship?

I have a rough idea for the improvement. Do you think classifier-free guidance can have an impact on this? This is just an open discussion and I am looking forward to your idea. Thanks.

@cornettoyu
Copy link
Collaborator

Thanks for your response.

I tried to (1) find the images in the reference batch for gfid evaluation, (2) reconstruct the images in the reference batch, and (3) evaluate the fid between the reconstructed reference batch and gt reference batch. I noticed that rfid seems still higher than gfid. I am not sure whether or not there is a technical flaw. Do you have any idea about this? Thanks.

I consider fid reflects the distribution difference. We have three distributions now, (1) recon, (2) gen, and (3) real. It seems that the difference between (2)-(3) can be smaller than (1)-(3). Since the cropping augmentation did not impose a distribution shift, I consider even the original rfid should also be able to reflect this relationship?

I have a rough idea for the improvement. Do you think classifier-free guidance can have an impact on this? This is just an open discussion and I am looking forward to your idea. Thanks.

Thanks for sharing the thoughts :) I agree that cfg usually has a huge impact on the FID score and potentially could lead to the problem. The assumption is interesting!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants