Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.clean_prefix_map(): remove_unused_builtins param #537

Open
joeflack4 opened this issue May 23, 2024 · 6 comments
Open

.clean_prefix_map(): remove_unused_builtins param #537

joeflack4 opened this issue May 23, 2024 · 6 comments
Labels
enhancement New feature or request

Comments

@joeflack4
Copy link
Collaborator

Overview

Given that built-in prefix maps are not required unless they appear somewhere in the mapping set, it would be nice to have a way to give users the ability to create an msdf and an SSSOM TSV that leaves these out.

@caufieldjh
Copy link

I would also like to have this option. Otherwise nearly every SSSOM TSV I create is more prefix map than anything else, by lines.

@matentzn
Copy link
Collaborator

@caufieldjh you must have small mapping sets then - there are only about 6 built in prefixes.. I do agree though with this issue, and happy to incorporate a parameter like this.

@caufieldjh
Copy link

I was observing the same effect @joeflack4 did - when instantiating a MappingSetDataFrame without a Converter, the included prefix map included all ~1.5K prefixes and the object takes some time to create.
If I do it more like this, instantiation is instantaneous:

    metadata = get_default_metadata()
    metadata['curie_map'] = {}
    converter = Converter.from_prefix_map(metadata['curie_map'])
    msdf = MappingSetDataFrame(converter=converter, df=df, metadata=metadata)

@matentzn
Copy link
Collaborator

@caufieldjh this is a slightly different ask: you are saying that when you instantiate with just a converter, you would like by default to run "clean prefix map".

I think I can see that could be a good idea - but its not the same as the parameter being asked for here.

@caufieldjh
Copy link

I still see this as related - the call to clean_prefix_map doesn't have to be a default, as there are certainly cases in which it's useful to start with a prefix map including builtins, add/merge new mappings into the msdf, then do the final cleanup step to remove unused prefixes (working under the assumption that some of those prefixes are now present in the mappings, too).

@joeflack4
Copy link
Collaborator Author

I guess it is slightly different but related. I think that actually what @caufieldjh is referring to is actually just a bug, because there are 1,547 prefixes included in the output when that happens.

In any case, just linking this related stuff here:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants