About data leakage on zero-shot classification? #9

LTEnjoy · 2023-11-02T04:22:07Z

Hello!

Thanks for your great work! I have tested the zero-shot classification given your released checkpoint and it did a good performance. But I am confused that whether there exists some data leakage problem? Your model was fine-tuned on Swiss-Prot database and the DeepLoc dataset was also constructed from UniProt database. Did you do some filtering when you tested zero-shot performance?

Looking forward to your reply! Thanks in advance!

KatarinaYuan · 2023-11-03T12:25:25Z

Hi,
Thank you being interested in our work!

Please see the pre-training dataset

ProtST/protst/dataset.py

Line 22 in db53a76

    
           "swiss_prot": "https://miladeepgraphlearningproteindata.s3.us-east-2.amazonaws.com/uniprotdata/uniprot_sprot_filtered.tsv",

. It does not expose test labeled data of each benchmark dataset that has not been observed during multimodal pre-training nor downstream fine-tuning.

LTEnjoy · 2023-11-04T04:14:55Z

Hi,

Thank you for the reply and I'll check it out!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About data leakage on zero-shot classification? #9

About data leakage on zero-shot classification? #9

LTEnjoy commented Nov 2, 2023 •

edited

Loading

KatarinaYuan commented Nov 3, 2023

LTEnjoy commented Nov 4, 2023

About data leakage on zero-shot classification? #9

About data leakage on zero-shot classification? #9

Comments

LTEnjoy commented Nov 2, 2023 • edited Loading

KatarinaYuan commented Nov 3, 2023

LTEnjoy commented Nov 4, 2023

LTEnjoy commented Nov 2, 2023 •

edited

Loading