-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add RemoteDandiset.has_data_standard() convenience function #958
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #958 +/- ##
==========================================
- Coverage 89.04% 88.55% -0.50%
==========================================
Files 76 77 +1
Lines 9698 10490 +792
==========================================
+ Hits 8636 9289 +653
- Misses 1062 1201 +139
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Co-authored-by: John T. Wodder II <[email protected]>
@yarikoptic before I write tests for this, how does this syntax look? |
dandi/dandiapi.py
Outdated
else: | ||
raise ValueError( | ||
f"'data_standard' must be an RRID (of form 'RRID:XXX_NNNNNNN`) or one " | ||
f"of the following values: {DATA_STANDARD_MAP.keys()}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dict.keys()
doesn't stringify in a human-friendly way. Use ", ".join(DATA_STANDARD_MAP.keys())
instead.
dandi/dandiapi.py
Outdated
This is determined by checking for "RRID:SCR_015242" in the "dataStandard" field | ||
of the assetsSummary of the dandiset. | ||
Returns True if the Dandiset contains one or more files of the indicated | ||
standard. Otherwise, returns False. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring should document the accepted data standards (or, at the very least, the RRID input format).
Co-authored-by: Yaroslav Halchenko <[email protected]>
dandi/dandiapi.py
Outdated
f"'data_standard' must be an RRID (of form 'RRID:XXX_NNNNNNN`) or one " | ||
f"of the following values: {", ".join(DATA_STANDARD_MAP.keys())}" | ||
) | ||
assets_summary = self.get_raw_metadata()["assetsSummary"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, assetsSummary
seems to be non optional, so must be present in a valid metadata record, but if not we would get KeyError
here... which could be good as to alert user that smth is wrong with metadata in that dandisets. But should it be an exception really or just a warning? I would lean to a be a warning and just return False
. WDYT @jwodder and @satra ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jwodder and @bendichter WDYT - better be strict or robust?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be fine with changing this to throw a warning if assetsSummary is missing
looks ok to me ;) left few insignificant comments |
@bendichter I have converted PR back to draft since needs more work. some comments are left above, and also would be nice to add some rudimentary unittest. Overall I think it would be a nice addition, so would be great to see if finished. |
I could use some guidance in how to write a test for this. Can I just use published dandisets on the archive? |
@bendichter Could you rebase or make an empty commit or something in order for the CI to run again? |
That's one option. Normally, I'd suggest just using the |
# Conflicts: # dandi/tests/test_dandiapi.py
ok, I added some tests. Let me know if these will work |
@bendichter Please run pre-commit on your code. |
@jwodder oh, I see. Is there a reason to not set the pre-commit hooks as GitHub Actions? We do this for NeuroConv and it works great. All PRs are automatically Black formatted, so there is no need for developers to worry about installing and running pre-commit. |
I think it is still a legit and potentially useful interface. @bendichter - what do you think? it might benefit from adhering to the same design principle of using Enum's instead of free strings, as RFed into by #1357 yesterday |
@yarikoptic what do you mean by Enum? Do you mean using Literal in the typehints? |
@bendichter He means using an |
OK, that looks like a fine approach. I'm not able to work on this right now. Feel free to take this over and implement Enums if you want to push this through now, otherwise, I can get to this next week |
Co-authored-by: Yaroslav Halchenko <[email protected]>
if assets_summary is None: | ||
warnings.warn( | ||
f"The raw metadata of RemoteDandiset {self.identifier} does not contain 'assetsSummary'. " | ||
f"Assuming that it does not contain {data_standard}.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bendichter You do not appear to be using pre-commit; if you were, I'm quite certain black would have moved that closing parenthesis to the next line. Please install pre-commit for the repository and run it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jwodder in this particular case, I accepted @yarikoptic 's proposed change in GitHub. I'll make sure pre-commit is installed and running properly for future PRs
:letsdoit:! @bendichter -- would you have time/effort to finish it up? |
@jwodder, what do you think of something like this?