Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate with SD-Connect (swift-browser-ui) to receive info about submitted files #148

Open
6 of 23 tasks
blankdots opened this issue Nov 26, 2020 · 7 comments
Open
6 of 23 tasks
Assignees
Labels
enhancement New feature or request EPIC next next on new board Temporary label for matching issues between boards

Comments

@blankdots
Copy link
Contributor

blankdots commented Nov 26, 2020

Description

We would need to integrate with https://github.com/neicnordic/sda-pipeline in order to be able to associate metadata to submitted files

We can use https://github.com/neicnordic/sda-orchestration/blob/master/sda_orchestrator/utils/consumer.py to consume files from inbox queue in order to associate them to a submission.

DoD (Definition of Done)

Integration with RabbitMQ message broker to read the messages about submitted files and their IDs as well as dataset IDs

Testing

Integration and unit testing

@blankdots blankdots added the enhancement New feature or request label Nov 26, 2020
@blankdots blankdots changed the title Integrate with MQ of SDA to receive messages about submitted files Integrate with SDA to receive info about submitted files Mar 2, 2021
@blankdots blankdots added this to the Open Beta milestone Aug 9, 2021
@blankdots
Copy link
Contributor Author

blankdots commented Nov 19, 2021

the messages we would receive e.g. from SD-Connect to SD-Submit are in the form of:

{
   "operation": "upload",
   "user":"john",
   "project": "csc_project",
   "filepath":"somedir/encrypted.file.gpg",
   "encrypted_checksums": [
      { "type": "md5", "value": "abcdefghijklmnopqrstuvwxyz"},
      { "type": "sha256", "value": "12345678901234567890"}
   ]
}

with operation being upload, remove, rename

for neicnordic/sda-pipeline#295 we will receive file identifiers in the form of

The messages SD-Submit would send to sda-pipeline:
trigger ingestion

{
   "type": "ingest", 
   "user": "user", 
   "project": "csc_project",
   "filepath": "somedir/encrypted.file.c4gh"
   "encrypted_checksums": [
      { "type": "md5", "value": "abcdefghijklmnopqrstuvwxyz"},
      { "type": "sha256", "value": "12345678901234567890"}
   ]
}

assign accession id to file after successful ingestion

{
   "user":"john",
   "filepath":"somedir/encrypted.file.c4gh",
   "accession_id": "EGAF12345678901",
   "decrypted_checksums": [
      { "type": "md5", "value": "abcdefghijklmnopqrstuvwxyz"},
      { "type": "sha256", "value": "12345678901234567890"}
   ]
}

and a message that will connect them to the dataset with:

{
   "type": "mapping",
   "user":"john",
   "dataset_id": "EGAD12345678901",
   "accession_ids": ["EGAF12345678901", "EGAF12345678902"]
}

@blankdots
Copy link
Contributor Author

blankdots commented Nov 19, 2021

we need to create a files database where we keep files uploaded and their ids will go under neicnordic/sda-pipeline#282 groups/projects

or have an API on SD-Connect

@blankdots blankdots changed the title Integrate with SDA to receive info about submitted files Integrate with SD-Connect (swift-browser-ui) to receive info about submitted files Nov 22, 2021
@blankdots blankdots removed this from the Open Beta milestone Mar 28, 2022
@blankdots blankdots added the next next label Mar 28, 2022
@genie9 genie9 added the on new board Temporary label for matching issues between boards label May 12, 2022
@blankdots
Copy link
Contributor Author

the files that we get from SD-Connect need to be associated as part of an object to be added to the published submission otherwise it does not make sense for those files to be part of a submission

@csc-felipe
Copy link
Contributor

May need to make it part of the workflows, and might need a new schema to be created, and added to workflows neicnordic/sda-pipeline#578.

@blankdots blankdots self-assigned this Sep 29, 2022
@blankdots blankdots added the EPIC label Oct 3, 2022
@blankdots
Copy link
Contributor Author

blankdots commented Oct 3, 2022

To list down how the communication between SD-Connect and SD-Submit will go

  1. SD-Connect
    • users uploads file(s)
    • selects set files/buckets for publication in SD-Submit (UI in SD-Connect to reflect that)
    • the files/buckets are shared (read access) with a project SD-Submit knows of
    • the files/buckets are tagged with a specific tag to be able to distinguish them in the SD-Connect UI
    • an API call is made to SD-Submit /files endpoint with information on: project (under the form project_), file path, file name, checksum (and additional information that is deemed necessary)
    • files will need to be re-encrypted with a key that sda-pipeline will know, or at leas the key added to the header
  2. SD-Submit
    • in the Files step of a workflow lists the files for a specific project from the files collection in MongoDB and optionally checks the Allas API for those files
    • after metadata has been attached to the files a user would publish the submissions
    • before it is published SD-Submit checks against the Allas API if the files have been removed, changed or checksum differs
    • if all is ok send ingestion message to RabbitMQ, for FEGA communication an Inbox message will need to be sent as well (e.g. at the Files step in the workflow
    • flag the files in the MongoDB that they have been submitted for a project (tbd if files can be selected multiple times to different submissions)
  3. sda-pipeline
    • accesses allas to read and split the files

@csc-felipe
Copy link
Contributor

with operation being upload, remove, rename

I'd say we only need upload and remove, or equivalent. Rename would be a delete followed by upload operations. This is in line with the object storage not having a rename feature.

@blankdots
Copy link
Contributor Author

with operation being upload, remove, rename

I'd say we only need upload and remove, or equivalent. Rename would be a delete followed by upload operations. This is in line with the object storage not having a rename feature.

Our inbox will be SD-Connect and we might not need monitoring these ops and will be skipped, the only use case where inbox messages will be required will be FEGA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request EPIC next next on new board Temporary label for matching issues between boards
Projects
None yet
Development

No branches or pull requests

3 participants