Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a prototype of Sample::developmental_stage backfill script #3461

Draft
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

arkid15r
Copy link
Contributor

Issue Number

#3438

Purpose/Implementation Notes

This is a draft/prototype of a Foreman command to use for Sample::developmental_stage backfill process. The code is untested and supports SRA source DB only.

Types of changes

What types of changes does your code introduce?

  • Bugfix (non-breaking change which fixes an issue)

Checklist

  • Lint and unit tests pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)
  • Any dependent changes have been merged and published in downstream modules

@arkid15r arkid15r requested a review from davidsmejia December 11, 2023 18:05
Copy link
Contributor

@davidsmejia davidsmejia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, a couple comments about approach.

Looking forward, I think that we will want to update the _apply_harmonized_metadata_to_sample to specifically handle updates vs new samples. This depends on what the science team says is appropriate.

@@ -94,6 +94,10 @@ def __str__(self):
created_at = models.DateTimeField(editable=False, default=timezone.now)
last_modified = models.DateTimeField(default=timezone.now)

# Auxiliary field for tracking latest metadata update time.
# Originally added to support Sample::developmental_stage values backfilling.
last_refreshed = models.DateTimeField(auto_now=True, null=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • We will probably want last_refreshed on Experiment as well, since a sample could belong to more than one experiment.
  • We probably want to add last_refresh_failure as a timestamp on both as well to help with re-running

logger.info(f"Refreshing metadata for a sample {sample.accession_code}")
try:
_, sample_metadata = SraSurveyor.gather_all_metadata(sample.accession_code)
SraSurveyor._apply_harmonized_metadata_to_sample(sample_metadata)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This takes sample as the first argument

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more consideration here is that after updating the sample we will want to update the cached values on experiment.

ie:

# Update our cached values
experiment.update_num_samples()
experiment.update_sample_metadata_fields()
experiment.update_platform_names()
experiment.save()

@davidsmejia davidsmejia self-assigned this Mar 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants