Question about the gfid << rfid #46

lxa9867 · 2024-11-05T02:26:56Z

Hi authors,

Thanks for sharing this interesting work.

I am curious about the relation between rfid and gfid presented by the recent works Maskbit and RAR. I noticed that the gfid can be significantly better compared to rfid (for example, rfig for RAR tokenizer is 2.28 while the gfid can be 1.48). Is there any explanation/discussion for this behavior? Thank you.

In addition, I noticed that the codebook size of RAR is set to 1024. Did you try to scale it up to a larger number?

cornettoyu · 2024-11-05T21:02:00Z

Thanks for your interests in our work.

Technically, rFID are gFID are computed in different ways. rFID is usually computed against ImageNet val set, while for gFID ppl usually follow ADM to compute against the virtual imagenet statistic. Thus the rFID and gFID are NOT directly comparable.

For the tokenizer, we directly borrow the off-the-shelf tokenizer from MaskGIT which comes at 1024 vocabulary size, we did not try to train our own tokenizer for RAR as the focus of this project is on the generator side so we would like to keep everything simpler. But I generally believe using a stronger tokenizer (e.g., larger vocabulary size as you suggested) may help RAR to get a better performance :)

lxa9867 · 2024-11-06T00:52:01Z

Thanks for your response.

I tried to (1) find the images in the reference batch for gfid evaluation, (2) reconstruct the images in the reference batch, and (3) evaluate the fid between the reconstructed reference batch and gt reference batch. I noticed that rfid seems still higher than gfid. I am not sure whether or not there is a technical flaw. Do you have any idea about this? Thanks.

I consider fid reflects the distribution difference. We have three distributions now, (1) recon, (2) gen, and (3) real. It seems that the difference between (2)-(3) can be smaller than (1)-(3). Since the cropping augmentation did not impose a distribution shift, I consider even the original rfid should also be able to reflect this relationship?

I have a rough idea for the improvement. Do you think classifier-free guidance can have an impact on this? This is just an open discussion and I am looking forward to your idea. Thanks.

cornettoyu · 2024-11-07T21:41:31Z

Thanks for your response.

I tried to (1) find the images in the reference batch for gfid evaluation, (2) reconstruct the images in the reference batch, and (3) evaluate the fid between the reconstructed reference batch and gt reference batch. I noticed that rfid seems still higher than gfid. I am not sure whether or not there is a technical flaw. Do you have any idea about this? Thanks.

I consider fid reflects the distribution difference. We have three distributions now, (1) recon, (2) gen, and (3) real. It seems that the difference between (2)-(3) can be smaller than (1)-(3). Since the cropping augmentation did not impose a distribution shift, I consider even the original rfid should also be able to reflect this relationship?

I have a rough idea for the improvement. Do you think classifier-free guidance can have an impact on this? This is just an open discussion and I am looking forward to your idea. Thanks.

Thanks for sharing the thoughts :) I agree that cfg usually has a huge impact on the FID score and potentially could lead to the problem. The assumption is interesting!

lxa9867 changed the title ~~Question about the gfid >> rfid~~ Question about the gfid << rfid Nov 5, 2024

cornettoyu mentioned this issue Nov 8, 2024

About the FID reported in the RAR paper #49

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the gfid << rfid #46

Question about the gfid << rfid #46

lxa9867 commented Nov 5, 2024

cornettoyu commented Nov 5, 2024

lxa9867 commented Nov 6, 2024 •

edited

Loading

cornettoyu commented Nov 7, 2024

Question about the gfid << rfid #46

Question about the gfid << rfid #46

Comments

lxa9867 commented Nov 5, 2024

cornettoyu commented Nov 5, 2024

lxa9867 commented Nov 6, 2024 • edited Loading

cornettoyu commented Nov 7, 2024

lxa9867 commented Nov 6, 2024 •

edited

Loading