Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asset 503: bad answer ID for thrombin/F2 gene #83

Open
colleenXu opened this issue Jun 28, 2024 · 6 comments
Open

Asset 503: bad answer ID for thrombin/F2 gene #83

colleenXu opened this issue Jun 28, 2024 · 6 comments
Assignees
Labels
Bad Asset Curie The given asset curie is not correct

Comments

@colleenXu
Copy link

colleenXu commented Jun 28, 2024

Asset 503 says the expected output name is UNIPROTKB:P00734, which is the gene F2 + its proteins (Dev NodeNorm).

However, the output ID CHEMBL.TARGET:CHEMBL204 is likely something no ARA/KP recognizes. NodeNorm doesn't resolve this ID..

I imagine the expected output ID needs to be changed to either NCBIGene:2147 (the gene ID which is primary since we're doing gene-protein conflation) or UniProtKB:P00734 (note the corrected CURIE format). I imagine the name needs changing to "F2/prothrombin/thrombin" or something like that.

@colleenXu
Copy link
Author

I think all tools have been consistently failing this test due to the answer ID. This asset is specifically described in #69

@colleenXu colleenXu changed the title Asset 503: make UniProtKB:P00734 the expected output ID/change the output name Asset 503: bad answer ID for thrombin/F2 gene Jun 28, 2024
@maximusunc
Copy link
Collaborator

Putting UNIPROTKB:P00734 into Name Resolver gives PR:000051041 which seems to be a subclass of thrombin. And using conflation in NN still gives the same ID. Are you using a different conflation?

@maximusunc maximusunc added the Bad Asset Curie The given asset curie is not correct label Jul 12, 2024
@colleenXu
Copy link
Author

colleenXu commented Jul 16, 2024

I don't agree with using prothrombin N-glycosylated 4 (human) / PR:000051041 as the CURIE. That looks like a very specific variation of prothrombin (N-glycosylated at Asn121) with very little literature on it.


It may be worth asking the original writer of the test what ID to use?

This is what I know:

  • F2 gene encodes the protein prothrombin, which is cleaved to create thrombin.
  • UniProtKB:P00734 is a ID for the F2 gene/prothrombin.
  • I was using NodeNorm Dev w/ conflation and found that UniProtKB:P00734 resolves to a F2/prothrombin entity.
  • I think UniProtKB:P00734 is an acceptable answer ID, but I'm not sure what the original writer of this test wanted.

@sandrine-muller-research

please have a look at my response to #84 this is the way I worked with the tests assets: since I am testing the UI I report what the UI states. I think One asset can be a pass or a fail for several reasons (I used to express them but I am not sure the metadata was kept in the long run). Getting what the UI states is helpful to use those test assets for other suites (e.g. a normalization suite).

@sandrine-muller-research

Please assign this email address/GitHub account in the future, I do not receive any notification with the other one. Thanks!

@sandrine-muller-research

FWIW, Please note that currently NameRes does not allow the user to conflate or not. F2, prothrombin, and thrombin are currently in the same clique, which is what is expected for now. This may need to change if we need to unconflate though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bad Asset Curie The given asset curie is not correct
Projects
None yet
Development

No branches or pull requests

4 participants