Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About data leakage on zero-shot classification? #9

Open
LTEnjoy opened this issue Nov 2, 2023 · 2 comments
Open

About data leakage on zero-shot classification? #9

LTEnjoy opened this issue Nov 2, 2023 · 2 comments

Comments

@LTEnjoy
Copy link

LTEnjoy commented Nov 2, 2023

Hello!

Thanks for your great work! I have tested the zero-shot classification given your released checkpoint and it did a good performance. But I am confused that whether there exists some data leakage problem? Your model was fine-tuned on Swiss-Prot database and the DeepLoc dataset was also constructed from UniProt database. Did you do some filtering when you tested zero-shot performance?

Looking forward to your reply! Thanks in advance!

@KatarinaYuan
Copy link
Collaborator

Hi,
Thank you being interested in our work!

Please see the pre-training dataset

"swiss_prot": "https://miladeepgraphlearningproteindata.s3.us-east-2.amazonaws.com/uniprotdata/uniprot_sprot_filtered.tsv",
. It does not expose test labeled data of each benchmark dataset that has not been observed during multimodal pre-training nor downstream fine-tuning.

@LTEnjoy
Copy link
Author

LTEnjoy commented Nov 4, 2023

Hi,

Thank you for the reply and I'll check it out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants