Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New method request: dosdp normalise #80

Open
5 tasks
matentzn opened this issue Oct 8, 2021 · 6 comments
Open
5 tasks

New method request: dosdp normalise #80

matentzn opened this issue Oct 8, 2021 · 6 comments
Assignees

Comments

@matentzn
Copy link
Collaborator

matentzn commented Oct 8, 2021

We need a method in dosdp which takes in a dosdp yaml file an returns it in a nicely canonical form:

We will use the ordering directly from the json schema, its perfect:

https://github.com/INCATools/dead_simple_owl_design_patterns/blob/master/src/schema/dosdp_schema.yaml#L299

Test data (bad examples):
https://github.com/monarch-initiative/mondo/tree/master/src/patterns/dosdp-patterns

Good example (mostly good):
https://github.com/obophenotype/upheno/blob/master/src/patterns/dosdp-patterns/abnormalAnatomicalEntity.yaml

I used this script back in the days:
https://github.com/monarch-initiative/mondo/blob/master/src/scripts/repair_mondo_patterns.py

@matentzn
Copy link
Collaborator Author

@dosumis any chance we can have @hkir-dev work on that, if he wants?

@hkir-dev
Copy link
Collaborator

Hi Nico, yes this is in my todo list. You can assign it to me.

@hkir-dev
Copy link
Collaborator

hkir-dev commented Jan 31, 2022

I faced some issues while ordering comments. Ruamel.yaml is associating comments with the preceding element (https://stackoverflow.com/a/42173906). Based on the context, this behavior sometimes make sense, but sometimes cause confusion. For example:

relations:
  part_of: "BFO:0000050"
#  in_taxon: "RO:0002162"

# My comment on vars is ....
vars:
  gross_cell_type: "'cell'"

So in this case both comments belong to relations (actually for ruamel they all are a single comment together). If I change order of the relations or vars, "My comment on vars is ...." is sticking with the relations.

vars:
  gross_cell_type: "'cell'"

relations:
  part_of: "BFO:0000050"
#  in_taxon: "RO:0002162"

# My comment on vars is ....

Do you have any suggestions for the fix. Or, we can release the initial version as is (just warn user to check comments in the pattern file) and try to fix this issue in later versions.

@matentzn
Copy link
Collaborator Author

Hmmm... interesting.. I am fine with a warning for now.. I don't think you can really do much, I don't think it is worth it to write a new parser.. I would suggest to add to the style guides that comments should be under the elements they pertain to.

@dosumis
Copy link
Collaborator

dosumis commented Jan 31, 2022

Hmmm. I think that moving comments around risks making well annotated patterns confusing.

@matentzn - can you remind me of the rationale for this again. I don't see a massively strong case for improving diffs (please correct me if I'm wrong). If it's just about readability then doesn't the markdown conversion deal with that? (we could add cannonical ordering there).

@matentzn
Copy link
Collaborator Author

matentzn commented Feb 1, 2022

I think having comments in a pattern like this is an antipattern all by itself, but that is another discussion.

Well, the question you are asking is the same as: "Is black useful in python?" Of course we can find an endless stream of arguments for an against it. I believe that canonical ordering, canonical quoting, canonical documentation practices makes patterns not only more readable which is a huge argument when managing hundreds of patterns like we do. We just want one way to format the patterns, else I am sitting there until kingdom cometh explaining to people how to use quotes properly, when to put "" around entities and when not, how to best represent three levels of quoting in template strings and so on. Its much easier for me to quickly look at a pattern and say: that is how it should be. I don't want to have 400 patterns in my library with every one looking totally different then the last, with different element orders, quoting and spacing.

I am definitely pro having a normalise method, and I would recommend we go ahead, but I am not dying on that hill. It feels like having a black for just about any data artefact we produce would be useful. But no, we it wont revolutionise the world of ontology engineering, so I guess you will still have to make a call.

hkir-dev added a commit that referenced this issue Feb 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants