Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synonym sync: acronym case exception #671

Open
wants to merge 4 commits into
base: sync1-synonyms
Choose a base branch
from

Conversation

joeflack4
Copy link
Contributor

@joeflack4 joeflack4 commented Sep 25, 2024

Overview

Addressed an issue where we were preferring the Mondo capitalization even though it was incorrect, i.e. cases where the synonym was an acronym and capitalized in the source, but was not capitalized in Mondo.

Pre-merge checklist

Documentation

Was the documentation added/updated under docs/?

  • Yes
  • No, updates to the docs were not necessary after careful consideration

QC

Was the full pipeline run before submitting this PR using sh run.sh make build-mondo-ingest on this branch (after
docker pull obolibrary/odkfull:dev), and no errors occurred?

  • Yes
  • No, there are no functional (code-related) changes to the pipeline in the PR, so no re-run is necessary

Build:

New Packages

Were any new Python packages added?

Were any other non-Python packages added?

PR Review and Conversations Resolved

Has the PR been sufficiently reviewed by at least 1 team member of the Mondo Technical team and all threads resolved?

  • Yes

Additional information

Context

@twhetzel and I discussed this at our last 1:1. Sabrina had done some recent curation and noticed that sometimes we would actually prefer the source's capitalization rather than Mondo's. I looked at the google sheet, at all of the values where Use Source Case (Curator Review) == Source, and saw that these were all acronyms. They were all cases where the source was all caps and Mondo was not.

Results

I ran a before/after, using DO as my test case, and looked at the differences in the outputs. There were differences in the doid.synonyms.confirmed.robot.tsv and doid.synonyms.updated.robot.tsv. I examined and the outputs are as I expected. Here are the diffs (FYI I added column headers at the top):

- Update: Addressed an issue where we were preferring the Mondo capitalization even though it was incorrect, i.e. cases where the synonym was an acronym and capitalized in the source, but was not capitalized in Mondo.
@joeflack4 joeflack4 changed the base branch from main to sync1-synonyms September 25, 2024 19:03
@joeflack4 joeflack4 added the enhancement New feature or request label Sep 25, 2024
@joeflack4 joeflack4 self-assigned this Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant