-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about the gfid << rfid #46
Comments
Thanks for your interests in our work. Technically, rFID are gFID are computed in different ways. rFID is usually computed against ImageNet val set, while for gFID ppl usually follow ADM to compute against the virtual imagenet statistic. Thus the rFID and gFID are NOT directly comparable. For the tokenizer, we directly borrow the off-the-shelf tokenizer from MaskGIT which comes at 1024 vocabulary size, we did not try to train our own tokenizer for RAR as the focus of this project is on the generator side so we would like to keep everything simpler. But I generally believe using a stronger tokenizer (e.g., larger vocabulary size as you suggested) may help RAR to get a better performance :) |
Thanks for your response. I tried to (1) find the images in the reference batch for gfid evaluation, (2) reconstruct the images in the reference batch, and (3) evaluate the fid between the reconstructed reference batch and gt reference batch. I noticed that rfid seems still higher than gfid. I am not sure whether or not there is a technical flaw. Do you have any idea about this? Thanks. I consider fid reflects the distribution difference. We have three distributions now, (1) recon, (2) gen, and (3) real. It seems that the difference between (2)-(3) can be smaller than (1)-(3). Since the cropping augmentation did not impose a distribution shift, I consider even the original rfid should also be able to reflect this relationship? I have a rough idea for the improvement. Do you think classifier-free guidance can have an impact on this? This is just an open discussion and I am looking forward to your idea. Thanks. |
Thanks for sharing the thoughts :) I agree that cfg usually has a huge impact on the FID score and potentially could lead to the problem. The assumption is interesting! |
Hi authors,
Thanks for sharing this interesting work.
I am curious about the relation between rfid and gfid presented by the recent works Maskbit and RAR. I noticed that the gfid can be significantly better compared to rfid (for example, rfig for RAR tokenizer is 2.28 while the gfid can be 1.48). Is there any explanation/discussion for this behavior? Thank you.
In addition, I noticed that the codebook size of RAR is set to 1024. Did you try to scale it up to a larger number?
The text was updated successfully, but these errors were encountered: