-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add lexical matching to standard SOP for New Ontology Requests #2517
Comments
While many of these matches make sense, some are totally off, such as Thus, this should not be an automated review for the dashboard, but rather presented to the ontology submitter for their review, so that they can be encouraged to import classes rather than recreate them, where appropriate. |
@cthoyt so this is not on your plate, can you share the script so we can setup a github action to do this? |
I’m wary of this approach. We should at least accept that accuracy will be
wildly variable depending on multiple arbitrary factors. There will be many
many false negatives because reasons. It will take deep obo knowledge to
make the results actionable (example: many new ontologies will have
concepts in OMIT. What then?)
I do however think there is opportunities for LLMs to help with initial
triage
…On Thu, Feb 8, 2024 at 6:43 AM Nico Matentzoglu ***@***.***> wrote:
Could you please run the script for other new ontologies?
@cthoyt <https://github.com/cthoyt> so this is not on your plate, can you
share the script so we can setup a github action to do this?
—
Reply to this email directly, view it on GitHub
<#2517 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAMMOOBXRJKPW7ZSQXX55LYSTQCNAVCNFSM6AAAAABCIMWHC6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZUGI3DQMZQGY>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Our thinking so far is this:
Re OMIT, I cannot say anything. We could push our "foundry status" back and define it as "passing the dashboard" - and only match against these. Just spitballing. |
Re OMIT, I cannot say anything. We could push our "foundry status" back
and define it as "passing the dashboard" - and only match against these.
Just spitballing.
You're wanting a deterministic quantitive solution where some qualitative
aspect is required. FMA may fail the dashboard but we'd still want to know
if a new ontology had massively overlapping content. There is probably a
way to get OMIT to pash the dashboard but that doesn't solve the problem of
its massively out of scope content.
…On Thu, Feb 8, 2024 at 7:21 AM Nico Matentzoglu ***@***.***> wrote:
Our thinking so far is this:
1. It is better to do it somewhat approximately than not to do it. A
lot of matches reveal a pattern.
2. The burden is on the submitters. They see the match, they check it
and say: "this is not the same thing".
Re OMIT, I cannot say anything. We could push our "foundry status" back
and define it as "passing the dashboard" - and only match against these.
Just spitballing.
—
Reply to this email directly, view it on GitHub
<#2517 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAMMOI7FSJGEV6QRIMUDGTYSTUOPAVCNFSM6AAAAABCIMWHC6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZUGM2TKMRTGA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
This is a bit of a complicated criticism.. (1) We do not want to let the 4th ontology defining We will not be able in any reasonable timeframe define "COB-Branch owning" ontologies. However, we could, possibly, use "bottom-up" COB mapping curation here to say: For new ontologies, only matches against ontologies mapped in COB are relevant. This is a bit shady, to be clear (not unreasonable, just a bit shady), as we refer from one system (OBO Library membership) to another (COB mappings), but I would be ok with that as well. But IMO the need to achieve (1) outweighs all other concerns you raised. We can get a touch of "qualitative" in there by adding an SOP that the ontology reviewer can apply judgement if some of these matches are blocking or not. What is the alternative? |
This can be seen as a variant or subtype of the ontology recommendation
problem (cc-ing Marcos by email as I don't know your github username!)
https://jbiomedsem.biomedcentral.com/articles/10.1186/s13326-017-0128-y
…On Thu, Feb 8, 2024 at 8:08 AM Nico Matentzoglu ***@***.***> wrote:
This is a bit of a complicated criticism..
(1) We do not want to let the 4th ontology defining Alzheimer's and
glucose into OBO Foundry Ontology Library
(2) We have no good way to separate GUOBO modules (components of the Grand
Unified OBO Ontology) from Application/Project ontologies.
We will not be able in any reasonable timeframe define "COB-Branch owning"
ontologies. However, we could, possibly, use "bottom-up" COB mapping
curation here to say: For new ontologies, only matches against ontologies
mapped in COB are relevant. This is a bit shady, to be clear (not
unreasonable, just a bit shady), as we refer from one system (OBO Library
membership) to another (COB mappings), but I would be ok with that as well.
But IMO the need to achieve (1) outweighs all other concerns you raised.
We can get a touch of "qualitative" in there by adding an SOP that the
ontology reviewer can apply judgement if some of these matches are blocking
or not.
What is the alternative?
—
Reply to this email directly, view it on GitHub
<#2517 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAMMOJ32WSV3DNVDN3FB53YST2A3AVCNFSM6AAAAABCIMWHC6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZUGQ2TENRYHA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Well, I dont really think this is quite the same. We are looking for a way to ensure that ontologies wanting to join the OBO Foundry do not significantly overlap with (key?) ontologies in OBO Foundry. And label matching is the only way I can think of right now to at least get started with this. What is the principle we should derive from a recommendation approach? I need to know at least how we should act now short term, like - is your preference that for the 5 ontologies currently under revision, we do not run lexical matching? If so what is your suggestion exactly? |
Here is an example of the problem This would not have been found by text mining. But clearly there needs to be coordination between these two ontologies. The correct way to do this by looking at the scope of the new ontology and the scope of existing ontologies. Currently this takes some very minimal knowledge of what is in OBO (I am not sure why we aren't doing this). This part could easily be semi-automated by e.g LLMs. But frankly everyone reviewing ontologies should be aware of the scope of different ontologies in OBO especially widely used ones like PO |
@matentzn I have a new repo where I am building up and storing various lexical indexes, it now has a pre-built one for OBO to make this much more user friendly (and not have to parse the resources yourself) https://github.com/biopragmatics/biolexica/tree/main/lexica/obo |
Thank you @cthoyt - i am not too sure though what Chris position seems to be here :D As soon as there is some agreement somewhere, I will assign someone to work on this! |
There doesn't necessarily have to be any agreement anywhere. OBO reviews are open and I can always re-run my script for each new ontology and post the results to the issue thread (I already sort of automated it). Any requester who disregards reasonable suggestions from this process has other bigger problems. |
@cthoyt I made a case at the last call that the OBO New Ontology Request Manager should be running your script as part of the official pipeline, so that you dont have to distribute your attention too much. If, while I am trying to make this overlap checking more official, you could keep making these "overlap" posts and add a sentence:
To make clear that the reviewer should actually consider this, I would be greatful! Thanks a ton! |
Hello @matentzn @cmungall , all... I am assuming you're familiar with the functionalities from BioPortal (and OntoPortal) that automatically compute the "lexical matches" (using LOOM) with all the other ontologies in the portal.... I often not promote very much this as a "mapping" feature (because we all know lexical matches are very limited) but I often argue on the fact that OntoPortals is the only place that when one drop an ontology he/she gets an automatic lexical overlap with all the other ontologies in the next hour. I mean could it help you address your need here? |
@jonquet thanks for chiming in. The real problem lies in the fact that the ontologies we need to check are not loaded in any indexed infrastructure (including Bioportal). @cthoyt idea is to basically have one massive lexical index covering all of OBO dumped, and have a script just compare quickly and incoming ontology with that index. I am not sure if BioPortal should be covering this specific use case, as it is primarily concerned with ontologies outside of BioPortal.. |
@pfabry cc @OBOFoundry/obo-foundry-operations-committee While I think there is value in doing lexical matching to assess overlap, I agree with Chris that it should be used a bit more wisely. The lexmatch should provide some evidence of non-reuse. But this is not just about using IRIs of existing ontologies. This should result in questions such as:
I would suggest if Paul continues creating these, that we:
|
I agree that the lexical match could be a valuable informative tool but should be used with caution as it is "only" a lexical match. I agree with the 3 propositions, but I think the lexical match could be done even earlier, at the pre-registration checklist.
Of course, label != meaning, but the lexical match could provide a general overview for the submitter. Thank you VERY much for the script. However, while I have been able to use it for GALLONT, I can't make it work for LSDAO and I really don't know why. I created an issue about this. |
Just a heads up and a question. Thanks to @cthoyt I have been able to run the lexmatch for the LSDAO ontology. @zhengj2007 as the reviewer of this ontology, how do you want to proceed? Do you want me to send you the file? post it directly in the issue? do not send it at all ? |
Thanks @cthoyt and @pfabry for lexmatch! It would be nice to post it on the LSDAO new ontology request issue. Thanks @pfabry ! |
What is the status of this? |
@cthoyt wrote this script to check overlap with existing OBO ontologies; I think we should fold this into our NOR SOP asap.
Below are the results. A case can be made that it's okay to duplicate NCIT terms since this is just an obo export of a resource that does not actually participate in the open community.
Lexical matching returned results
CAROLIO:0000411
mild painncit:C136549
Neck Pain Score 2 (0.54)CAROLIO:0000412
moderate painncit:C121394
Moderate Extremity Pain (0.54)ncit:C136551
Neck Pain Score 4 (0.54)CAROLIO:0000413
no painncit:C119987
Had No Pain (0.54)ncit:C121390
No Extremity Pain (0.54)ncit:C136547
Neck Pain Score 0 (0.54)CAROLIO:0000414
severe painncit:C121395
Severe Extremity Pain (0.54)ncit:C136553
Neck Pain Score 6 (0.54)CAROLIO:0001000
caroli syndromedoid:0081394
Caroli syndrome (0.772)mondo:0018808
Caroli syndrome (0.772)CAROLIO:0003100
endoscopic treatmentncit:C16546
Endoscopic Procedure (0.54)CAROLIO:0003120
endoscopic retrograde cholangiopancreatographymaxo:0035049
endoscopic retrograde cholangiopancreatography (0.778)ncit:C16430
Endoscopic Retrograde Cholangiopancreatography (0.762)CAROLIO:0003200
interventional radiology procedurencit:C63334
Interventional Radiology Procedure (0.762)CAROLIO:0003210
locoregional therapyncit:C25388
Local-Regional (0.54)ncit:C94796
Locally Recurrent Malignant Neoplasm (0.54)CAROLIO:0003220
paracentesismaxo:0035106
paracentesis (0.778)ncit:C15310
Paracentesis (0.762)CAROLIO:0003250
transjugular intrahepatic portosystemic shuntncit:C126288
Transjugular Intrahepatic Portosystemic Shunt (0.762)CAROLIO:0003300
pharmaceutical treatmentmaxo:0000058
pharmacotherapy (0.556)CAROLIO:0003310
antibiotic treatmentncit:C258
Antibiotic (0.762)chebi:33281
antimicrobial agent (0.556)xco:0000482
antimicrobial agent (0.556)CAROLIO:0003320
antiemetic treatmentchebi:50919
antiemetic (0.778)xco:0001245
antiemetic (0.778)ncit:C267
Antiemetic Agent (0.556)CAROLIO:0003330
bile acid treatmentchebi:3098
bile acid (0.778)chebi:22868
bile salt (0.549)ncit:C74800
Bile Acid Measurement (0.54)CAROLIO:0003340
chemotherapymaxo:0000647
chemotherapy (0.778)ncit:C15632
Chemotherapy (0.762)CAROLIO:0003350
diuretics treatmentchebi:35498
diuretic (0.778)xco:0000122
diuretic (0.778)ncit:C448
Diuretic (0.762)CAROLIO:0003360
octreotide treatmentchebi:7726
octreotide (0.778)ncit:C711
Octreotide (0.762)CAROLIO:0003370
proton pump inhibitor treatmentxco:0000577
proton pump inhibitor (0.778)ncit:C29723
Proton Pump Inhibitor (0.762)chebi:49200
EC 3.6.3.10 (H(+)/K(+)-exchanging ATPase) inhibitor (0.556)CAROLIO:0003380
pruritus treatmenthp:0000989
Pruritus (0.762)ncit:C3344
Pruritus (0.762)scdo:0000935
Pruritus (0.762)symp:0000432
itching (0.556)ncit:C58006
Pruritus, CTCAE (0.54)CAROLIO:0003400
radiation therapymaxo:0000014
radiation therapy (0.778)ncit:C15313
Radiation Therapy (0.762)CAROLIO:0003500
surgical treatmentncit:C15329
Surgical Procedure (0.54)CAROLIO:0003510
organ transplantncit:C122934
Organ Graft (0.54)CAROLIO:0003520
roux-en-yncit:C51756
Roux-en-Y Anastomosis (0.549)CAROLIO:0003530
surgical resectionmaxo:0000448
surgical resection (0.778)ncit:C158758
Resection (0.54)Lexical matching returned no results
CAROLIO:0000400
value partitionCAROLIO:0000410
pain scaleCAROLIO:0000420
symptom recurrence statusCAROLIO:0000421
non-recurrent symptom statusCAROLIO:0000422
recurrent symptom statusCAROLIO:0002000
variceal bleedingCAROLIO:0003110
endoscopic band ligationCAROLIO:0003121
biliary drainageCAROLIO:0003122
biliary dilatationCAROLIO:0003123
biliary stent placementCAROLIO:0003124
gallstones removalCAROLIO:0003230
percutaneous aspiration and drainageCAROLIO:0003240
percutaneous transhepatic cholangiogramHowever, these have big overlap with MAXO and SYMP/HP, and should be considered to be submitted there.
Originally posted by @cthoyt in #2406 (comment)
The text was updated successfully, but these errors were encountered: