Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document interpretation of mappings in data analysis #310

Open
sbello opened this issue Aug 3, 2023 · 5 comments
Open

Document interpretation of mappings in data analysis #310

sbello opened this issue Aug 3, 2023 · 5 comments
Labels
documentation Improvements or additions to documentation

Comments

@sbello
Copy link

sbello commented Aug 3, 2023

In manual mappings we often make broad/narrow/close mappings. It would be good to document how these mappings could/should be interpreted.

Interpretation is dependent on the direction the analysis is being made from (subject or object as the input). Assuming the input is the object (HPO term in most/all the files), I would propose that an analysis pipeline should use:

  • broadMatch (HP broader than MP) treat the same as exact as all MP terms will be relevant to the HP term
  • closeMatch (HP and MP are very similar but not exactly the same) give less weight than exact but more than narrow or related
  • narrowMatch (MP broader than HP) give less weight to these as some/many of the MP terms will not be as relevant to the HP term
  • relatedMatch give these the least weight or allow the user to exclude/include these on demand

We should include some examples of each type of mapping in the documentation.

We should also give a general guideline of how "broad" broad is. Some examples of broad matches:

  • same anatomical entity or process but less specific phenotype
  • Aplasia/Hypoplasia of the thymus (HP:0010515) and athymia (MP:0000705)
  • Abnormal intramembranous ossification (HP:0012790) and delayed intramembranous bone ossification (MP:0003420)
  • Abnormal ethmoid bone morphology (HP:0430005) and small ethmoid bone (MP:0030303)
  • more specific anatomical entity or process
  • Abnormal reflex (HP:0031826) and limb grasping (MP:0001513)
  • Decreased adipose tissue (HP:0040063) and decreased white adipose tissue amount (MP:0001783)
  • Morphological abnormality of the inner ear (HP:0011390) and abnormal otic capsule morphology (MP:0000039)

The question I struggle with most is how far up the tree to make a broad/narrow match. For example should 'Highly arched eyebrow' (HP:0002553) be mapped to anything in the MP? The closest term I can come up with is abnormal coat/ hair morphology (MP:0000367) which seems to me too distant to be of use in analysis. Similarly, the closest match to Cranial nerve paralysis (HP:0006824) in the MP would be abnormal nervous system physiology (MP:0003633). Again this seems to distant to be useful.

@matentzn matentzn transferred this issue from mapping-commons/mh_mapping_initiative Aug 4, 2023
@matentzn
Copy link
Collaborator

matentzn commented Aug 4, 2023

@sbello I moved this issue to the SSSOM repo because it is of universal relevance in my opinion.

We cant immediately do something about this issue I think, but we should add our own thoughts and insights as we stumble across them!

@matentzn matentzn added the documentation Improvements or additions to documentation label Aug 4, 2023
@matentzn matentzn changed the title document interpretation of mappings in analysis Document interpretation of mappings in data analysis Aug 4, 2023
@graybeal
Copy link

I think this is universally relevant but is only meaningful very narrowly. The interpretation depends not only on the ontologies in question (in this example), but on the intention of the mapper, and most importantly the application/end user that is using the mapping.

Crude example: I say "woman" hasBroader "human". If my query is "find studies with women", should the response include studies that have "human" but not "woman"? I think it's the end user who knows the answer to that, not the person who made the mapping. (And it wouldn't be any different if I said "human" hasNarrower "woman".)

That's only one use case; there are many use cases and the 'correct' answer is a function of the use case/

CloseMatch and relatedMatch are entirely subjective from the start, and suffer from the same "it depends on the user and the use case" impact on the result. I like the idea of more weight/less weight here, but in some cases I want any information that can be provided (so give me broad/narrow relations all the way up the tree), in other cases I want something I can be confident in (so I don't even want to use closeMatch, let alone any of the others).

@matentzn
Copy link
Collaborator

@graybeal good to hear from you again! :)

Crude example: I say "woman" hasBroader "human". If my query is "find studies with women", should the response include studies that have "human" but not "woman"? I think it's the end user who knows the answer to that, not the person who made the mapping. (And it wouldn't be any different if I said "human" hasNarrower "woman".)

I agree with you that it depends on the use case which predicates should be applied, and how. That is I think one of the things that we want to document - more "tutorial" like, not really SSSOM-reference level documentation. So maybe we should rephrase this issue here a bit towards building up a guide for thinking about "Use-case specific application of mappings for data scientists".

So @sbello's problem would be just one scenario of many, that we should characterise.

@sbello re-reading your issue, it is clear that you are struggling also with a mix of concerns:

The question I struggle with most is how far up the tree to make a broad/narrow match.

This is mixing the "use case" problem (How should the mapping be applied in a data analysis setting) with the representation issue (which mappings should I include in the mapping set, and which not).

So for you, I think the first step here is characterising clearly the target use case for your mapping set first. This will give you a clue as to "how far up to go for a broad match". Some use cases require you to go all the way up (faceted browsing on a website) and others do not like it at all when you go up more than a tiny bit ("give me the most similar term in the other ontology").

@sbello
Copy link
Author

sbello commented Aug 16, 2023

@matentzn The use case we have in mind is finding similar terms so we don't want to go all the way up but there is still a question, for me at least, of how far up is useful. But that could just be me overthinking things :)

@matentzn
Copy link
Collaborator

So in this case, we want to approximate a biological concept, "phenotypic similarity", which is not fully formalised as you know. By sense is that if its about similarity, not data grouping, I would say that if the
E is not "part of the same homology cluster" then I would probably be a cautious

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants