Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add copyFiles pipeline #968

Open
wants to merge 5 commits into
base: release/mvp
Choose a base branch
from
Open

Conversation

bilalebi
Copy link

JIRA Ticket

https://www.ebi.ac.uk/panda/jira/browse/EA-1276

Description

This pipeline copies files from a source directory to a destination using rsync. It optionally sends email notifications on completion or failure.

Use case & Benefits

With this pipeline, we will be able to automatically copy files from one directory to another

nextflow run copyFiles.nf -c ../filescopy.config --source /path/to/source/data  --destination /path/to/destination --file_format .vcf --send_email yes --email_recipient [email protected]

Features

  • File Transfer: Uses rsync to copy files from the specified source to the destination.
  • File Filtering: Only processes files matching the specified format (e.g., .vcf).
  • Optional Email Notifications: Sends an email with the pipeline's status, controlled by a send_email flag.
  • Directory Checks: Ensures source and destination directories exist before execution.

Testing

I tested the pipeline locally and it's working as expected

  • Test it on codon

@bilalebi bilalebi requested a review from dpopleton November 19, 2024 09:30
Copy link
Contributor

@dpopleton dpopleton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is fine, baring the comments, but not too useful at the moment.
I am approving, but we need some sort of DC, even a dummy, before I want to include it in pipelines. I am not sure how you want to do that with a full directory sync, as opposed to single files.
I thought something like this:

process InitialDatacheck {
    label 'mem2GB_DM'
    input:
    path initial_file

    output:
    path initial_file

    script:
    """
    echo "Running datacheck on initial file ${initial_file}"
    ensembl-datacheck --file ${initial_file} --test=${params.file_type}
    """
}

But that can come later, if need be.

nextflow/workflows/copyFiles.nf Outdated Show resolved Hide resolved
nextflow/workflows/copyFiles.nf Show resolved Hide resolved
@@ -0,0 +1,11 @@
includeConfig './base.config'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the slurm configuration from production.config

}

// Function to build and send email
def sendPipelineStatusEmail(String pipelineName, String status, String recipient) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we simplify this by using the built-in Nextflow workflow notification report with -N? e.g. https://www.nextflow.io/docs/latest/notifications.html#workflow-notification

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why I though of creating a custom email is because -N doesn't mention the source and destination locations.

But if it's not needed, we can use -N feature as you suggested

bilalebi and others added 3 commits November 28, 2024 15:46
Co-authored-by: Daniel Poppleton <[email protected]>
Co-authored-by: Daniel Poppleton <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants