Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Alternative symbol" synonyms & "Formerly" #132

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

joeflack4
Copy link
Contributor

@joeflack4 joeflack4 commented Sep 8, 2024


Google drive folder with outputs and diff.

Changes

Add alt & included symbol synonyms

  • Update: Now adding synonyms for alt symbols, w/ type of mondo#abbreviation.
  • Update: Now adding synonyms for alternative symbols, w/ type of mondo#abbreviation.
  • Update: Now stripping the text ', INCLUDED' from symbols. Previously was only doing for titles.
  • Update: For included & alternative titles/symbols ending in ', FORMERLY', these are being added as relatedSynonyms, and also being marked as owl:deprecated.
  • Misc updates: Todo comments. Renamed variables/methods for consistency and accuracy. Refactorings.

'Draft' status

Joe 10/03: I believe that from my end, requested changes and all else is done. But I think Trish wanted more time to review, so to prevent accidental merge we put in draft mode.

@joeflack4 joeflack4 self-assigned this Sep 8, 2024
@joeflack4 joeflack4 marked this pull request as draft September 8, 2024 04:22
@joeflack4 joeflack4 added the enhancement New feature or request label Sep 8, 2024
- Update: Now adding synonyms for alt & included symbols, w/ type of mondo#abbreviation.
- Misc updates: Todo comments. Renamed variables/methods for consistency and accuracy. Refactored some things.
@joeflack4 joeflack4 changed the title The symbol value from the fields "Alternative titles; symbols" and "Included Title(s); symbols" is missing "Alternative symbol" synonyms Sep 14, 2024
@joeflack4 joeflack4 changed the title "Alternative symbol" synonyms "Alternative symbol" synonyms & "Formerly" Sep 14, 2024
@joeflack4 joeflack4 marked this pull request as ready for review September 15, 2024 03:09
- In previous commits, stated that this would also include 'included' symbols, but for now we have decided not to proceed with that.
- Update: Now adding synonyms for alternative symbols, w/ type of mondo#abbreviation.
- Update: Now stripping the text ', INCLUDED' from symbols. Previously was only doing for titles.
- Update: For included & alternative titles/symbols ending in ', FORMERLY', these are being added as relatedSynonyms, and also being marked as owl:deprecated.
- Misc updates: Updated some comments and codestyle.
- Bug fix: mondo#ABBREVIATION URI
- Update: Typo and codestyle of cleanup_label()
twhetzel

This comment was marked as duplicate.

@twhetzel
Copy link
Contributor

twhetzel commented Sep 21, 2024

@joeflack4 do you plan to update this to use OMO:0003000?

omim2obo/main.py Outdated
graph: Graph, source: URIRef, prop: URIRef, target: Union[Literal, str, URIRef],
anno_pred_vals: List[Tuple[URIRef, Union[Literal, str, URIRef]]]
):
"""Add an axion annotation to the graph."""
Copy link
Contributor

@twhetzel twhetzel Sep 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: 'axion'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

) -> Tuple[List[str], List[str], List[str], List[str]]:
"""Separate current title/symbols from deprecated (marked 'former') ones"""
former_titles = [x for x in titles if ', FORMERLY' in x.upper()]
former_symbols = [x for x in symbols if ', FORMERLY' in x.upper()]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to apply this to symbols that are in the "Alternative Title(s); symbol(s)" column since 'FORMERLY' is used with symbols too, e.g. https://www.omim.org/entry/606417?search=606417&highlight=FORMERLY

Copy link
Contributor Author

@joeflack4 joeflack4 Sep 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, it looks like I only made this change to "alternative titles/symbols" actually, satisfying:

But this does not satisfy also doing this for "included" as for:

I'm really surprised I didn't do that. I thought I had!

  • @joeflack4 Deal with "formerly" "included" entries

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, one reason I hadn't done this is because of:

For now I'm adding them as mondo#omim_included. I don't think we can mark them as a synonym type unless we start doing that with "included titles" as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Complete. Now we are processing all of the "former" entries:

  • alt titles
  • alt symbols
  • included titles
  • included symbols

titles2 = [remove_included_and_formerly_suffixes(x) for x in titles]
symbols2 = [remove_included_and_formerly_suffixes(x) for x in symbols]
# additional reformatting for titles
titles2 = [cleanup_label(x) for x in titles2] # todo: is this redundant? gets done when adding synonyms to graph
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try to sort out this answer and remove the #todo here

Copy link
Contributor Author

@joeflack4 joeflack4 Sep 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it's not really this line that's redundant, it's several places further down in the process.

This became more involved than I anticipated. I did it as a separate PR:

Even though this line titles2 = [cleanup_label(x) for x in titles2] is new, those changes really aren't in scope for this PR, so I think we can address that PR separately and merge it into main later.

omim2obo/main.py Outdated
if inc_titles_str:
included_titles, included_symbols = parse_title_symbol_pairs(inc_titles_str)
included_titles, included_symbols = clean_alt_and_included_titles(included_titles, included_symbols)
included_is_included = included_titles or included_symbols # redundant. can't be included symbol w/out title
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you doing this so you don't have to check for an empty list to see if the entry has any values in the "Included Title(s); symbols" column?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naw. The reason I did this is to make the code clearer. Didn't want to come across the code just checking for "included titles" and have someone thing "well why isn't it checking for included symbols?".

Come to think of it, another approach is that I could remove this line. And instead, further down, I could potentially replace if included_is_included with something like if included_titles: # fyi: only need to check for included titles because there can't be any included symbols unless there are included titles.

Copy link
Contributor

@twhetzel twhetzel Sep 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it is a somewhat oddly named variable and duplicates information, included_titles, that already exists in this case an could be used in that later check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

omim2obo/main.py Outdated
graph.add((omim_uri, RDFS['comment'], Literal(included_detected_comment)))
for included_label in cleaned_inc_labels:
graph.add((omim_uri, URIRef(INCLUDED_URI), Literal(label_cleaner.clean(included_label, abbrev))))
# - exact titles
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it would be more clear if you maintained the omim terminology of Preferred or Alternative since there is not an "exact title"

Copy link
Contributor Author

@joeflack4 joeflack4 Sep 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! I would agree with you.

Actually, AFAIK I've dropped the previous "exact" nomenclature, which I think was referencing "preferred" previously, everywhere in the codebase.

What 'exact' means here is 'exact synonyms: titles'.
Below it I have 'related titles', 'exact abbreviations', 'related, deprecated 'former' titles', and 'related, deprecated 'former' abbreviations'.

If you want, I can update all these comments to say "exact/related synonyms".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you are not going to use the terminology from the mimTitles file, then document somewhere what "exact titles" is, for example the value from the "Preferred Title; symbol" column for the "Preferred Title" after processing and now being added as an exact synonym or 'exact synonyms: titles' or the processed title value added as an exact synonym

This kind of documentation should be done for each of the types below, e.g. exact abbreviations, etc.

It's easy to read that the information is being added as an exact synonym for example, but since "exact title" is something made up and there is a lot of processing of the value/variable (pref_title) being added here there is a fair bit of backtracking one needs to do to sort out what the original value from the file is being added here since there nothing in the source file named "exact title"

Copy link
Contributor Author

@joeflack4 joeflack4 Sep 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I successfully communicated earlier. The comment isn't inspired by the previously named variables. I've since refactored all the variable names to be in sync with the OMIM data model.

The 'exact' refers to 'exact synonym' I mean to say that this is a section for "any title (non symbol) that is an exact synonym". Easier to see if you look at this line with the lines around it:

# Add synonyms
# - exact titles
graph.add((omim_uri, oboInOwl.hasExactSynonym, Literal(label_cleaner.clean(pref_title, pref_abbrev))))

In any case, I have refactored these comments! They will be easier to read now and will not cause any confusion.

omim2obo/main.py Outdated Show resolved Hide resolved
omim2obo/main.py Outdated
for title in former_alt_titles:
clean_title = Literal(label_cleaner.clean(title, pref_abbrev))
add_triple_and_optional_annotations(graph, omim_uri, oboInOwl.hasRelatedSynonym, clean_title,
[(OWL.deprecated, Literal(True))])
Copy link
Contributor

@twhetzel twhetzel Sep 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think that the intent was to add owl:deprecated as an annotation on the synonym annotation. I have a question into Sabrina to confirm this

Copy link
Contributor Author

@joeflack4 joeflack4 Sep 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, OK. Right now I can't think of anything else I would deprecate. I wouldn't want to deprecate the whole term, since non-deprecated MIMs can have both deprecated and non-deprecated synonyms.

  • @twhetzel to discuss "formerly" title/symbol deprecation with Sabrina

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is a need to add owl:deprecated to anything here

Copy link
Contributor Author

@joeflack4 joeflack4 Sep 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I'm basing this change off of:

In that issue it links to the discussion with Nicole where she talks about this being their standard practice.

Example:

synonym: "juvenile carcinoma (formerly)" RELATED DEPRECATED [GARD:0009408]

This change makes it so that this process is automated for them.


edit: FYI @twhetzel you also asked me to make this change it looks like: comment

Copy link
Contributor

@twhetzel twhetzel Sep 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let's confirm with @sabrinatoro which of the suggestions to follow since the comments in that issue from Sabrina and Nicole are not in agreement.

omim2obo/main.py Outdated Show resolved Hide resolved
omim2obo/main.py Outdated
for abbrevs in [pref_symbols, alt_symbols]:
for abbreviation in abbrevs:
add_triple_and_optional_annotations(graph, omim_uri, oboInOwl.hasExactSynonym, Literal(abbreviation),
[(OBOINOWL.hasSynonymType, MONDONS.abbreviation)])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs updating from other decisions, e.g. oboInOwl and use the OMO abbreviation property

Copy link
Contributor Author

@joeflack4 joeflack4 Sep 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@twhetzel For those types of situations, my solution is usually (a) if I merge those first, I merge main back into this PR, or (b) if I merge this first, then when those get merged, those changes make it into main anyhow.

Usually I opt for 'a' for best results.

But sometimes it is better to be duplicative about this stuff. Builds do run off of main after all. Actually, one just ran and released, but I deleted it because it did not include this OMO bug fix.

Anyway I'll go ahead and propagate those changes to this PR, perhaps just by merging those feature branches into it now.

Great review btw! Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and merged in main (after merging in the other PRs you approved) as well as:

So now those changes have propagated here.

I also added this warning to the OP:

Warning

Merging this will also merge the following PR, so it should be completed first:

Copy link
Contributor

@twhetzel twhetzel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • update this branch based on past discussions
  • blocked until question about owl:deprecated is confirmed, typically this is only added to classes
  • other comments inline

- Update: Docstring to make clearer
- Update: Typo fix
- Update: Now "former" included titles/symbols are also being processed
- Update: Comments to improve clarity
- Delete: redundant variable 'included_is_included', and refactored / recommented around that change.
- Formerly: synonym pred: related -> exact
- Formerly: owl:drepecated synonym annotation -> MONDO:omim_formerly
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants