Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a way to mirror data from EGA #146

Closed
wants to merge 14 commits into from

Conversation

csc-jm-zz
Copy link

@csc-jm-zz csc-jm-zz commented Nov 23, 2020

Description

This PR aims to implement a solution to mirror data from EGA into the application database with its own /mirror endpoint.

Related issues

Ticket no. 116 from internal board

Type of change

  • New feature (non-breaking change which adds functionality)

Changes Made

  • helper methods for requesting data from EGA
  • added endpoint for mirroring
  • some docstring and typo fixes

Testing

  • Unit Tests

Mentions

Script used for mirroring is heavily based on the one found here.

@csc-jm-zz csc-jm-zz added the enhancement New feature or request label Nov 23, 2020
@csc-jm-zz csc-jm-zz self-assigned this Nov 23, 2020
@csc-jm-zz csc-jm-zz force-pushed the feature/data-mirroring branch 2 times, most recently from e40ff7d to 8a4aae9 Compare December 3, 2020 07:35
@blankdots
Copy link
Contributor

considering PRs #158 and #151 I think we will need to group at least each dataset related information and make that dataset public so that the model we are working on the DB is followed.

"""
dataset_id = req.match_info["datasetId"]
db_client = req.app["db_client"]
await spawn(req, self.data_mirroring(dataset_id, db_client))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this returns a Job https://aiojobs.readthedocs.io/en/stable/api.html#job which we can watch to see if it has finished or not.

I think it is perfectly fine if we move these it its own handler class/ file as not sure it reuses anything from the base handler class.

There we could implement also an endpoint where we would check the status of that job (active, pending or closed)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The endpoint was moved to its own handler class in handlers.py, but job status check endpoint remains unimplemented.

metadata_backend/server.py Outdated Show resolved Hide resolved
:param dataset_id: ID of dataset in EGA
:param db_client: Motor client used for database connections
"""
ega_data = MetadataMirror().mirror_dataset(dataset_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to check if the database already contains the dataset and if it is in a published folder (because we are mirroring public information, I think the folder should be published by default, but if there is another option fine with that as well) so that we don't mirror it again

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 890-892 of handlers.py has an attempt at doing this check. However, the check is not implemented properly yet and needs refining.

@csc-jm-zz csc-jm-zz force-pushed the feature/data-mirroring branch from 8a4aae9 to 228992b Compare December 21, 2020 06:47
@csc-jm-zz csc-jm-zz force-pushed the feature/data-mirroring branch from 228992b to dbe46d8 Compare December 31, 2020 10:08
@csc-jm-zz
Copy link
Author

Unfortunately, I will not be able to work on this PR any further myself so the things that still need to be done here are:

  • Fix the check for whether datasets from EGA with a specific egaStableId have already been mirrored or not
  • Possibly implement the endpoint for checking status of running/finished jobs (this could be merged without it and done in another PR)
  • Accessing the endpoint seems to try to save a local file currently which is not desired behavior. The helper methods should only add the mirrored data from EGA to the application database in the background

@csc-jm-zz csc-jm-zz marked this pull request as ready for review December 31, 2020 12:08
@blankdots
Copy link
Contributor

we will probably not need it for now, closing as we need to mirror data to EGA #295

@blankdots blankdots closed this Nov 29, 2021
@csc-jm csc-jm deleted the feature/data-mirroring branch September 9, 2022 12:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants