Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trio #55

Open
wants to merge 14 commits into
base: dev
Choose a base branch
from
Open

Trio #55

wants to merge 14 commits into from

Conversation

yumisims
Copy link
Contributor

@yumisims yumisims commented Sep 23, 2024

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

Copy link

github-actions bot commented Sep 23, 2024

nf-core lint overall result: Passed ✅ ⚠️

Posted for pipeline commit fac5b0a

+| ✅ 130 tests passed       |+
#| ❔  19 tests were ignored |#
!| ❗   3 tests had warnings |!

❗ Test warnings:

  • files_exist - File not found: conf/igenomes.config
  • pipeline_todos - TODO string in base.config: Check the defaults for all processes
  • pipeline_todos - TODO string in base.config: Customise requirements for specific processes.

❔ Tests ignored:

  • files_exist - File is ignored: assets/nf-core-genomeassembly_logo_light.png
  • files_exist - File is ignored: docs/images/nf-core-genomeassembly_logo_light.png
  • files_exist - File is ignored: docs/images/nf-core-genomeassembly_logo_dark.png
  • files_exist - File is ignored: .github/ISSUE_TEMPLATE/config.yml
  • files_exist - File is ignored: .github/workflows/awstest.yml
  • files_exist - File is ignored: .github/workflows/awsfulltest.yml
  • nextflow_config - Config variable ignored: manifest.name
  • nextflow_config - Config variable ignored: manifest.homePage
  • files_unchanged - File ignored due to lint config: LICENSE or LICENSE.md or LICENCE or LICENCE.md
  • files_unchanged - File ignored due to lint config: .github/CONTRIBUTING.md
  • files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/bug_report.yml
  • files_unchanged - File ignored due to lint config: .github/workflows/linting.yml
  • files_unchanged - File ignored due to lint config: assets/sendmail_template.txt
  • files_unchanged - File does not exist: assets/nf-core-genomeassembly_logo_light.png
  • files_unchanged - File does not exist: docs/images/nf-core-genomeassembly_logo_light.png
  • files_unchanged - File does not exist: docs/images/nf-core-genomeassembly_logo_dark.png
  • files_unchanged - File ignored due to lint config: lib/NfcoreTemplate.groovy
  • files_unchanged - File ignored due to lint config: .gitignore or .prettierignore or pyproject.toml
  • actions_awstest - 'awstest.yml' workflow not found: /home/runner/work/genomeassembly/genomeassembly/.github/workflows/awstest.yml

✅ Tests passed:

Run details

  • nf-core/tools version 2.8
  • Run at 2024-11-13 15:06:00

@ksenia-krasheninnikova
Copy link
Contributor

Hi @yumisims
Looks good to me in general, thanks for your work! What has to be changed - we don't have to run purge_dups when hifiasm is run in the trio mode. The logic is similar to the the hap1/hap2 assembly - each of the two hifiasm trio-phased files is to be scaffolded up directly:

if ( hifiasm_hic_on ) {
//
// SUBWORKFLOW: MAP HIC DATA TO THE HAP1 CONTIGS
//
HIC_MAPPING_HAP1 ( RAW_ASSEMBLY.out.hap1_hic_contigs, crams_ch, hic_aligner_ch, 'hap1' )
ch_versions = ch_versions.mix(HIC_MAPPING_HAP1.out.versions)
//
// SUBWORKFLOW: SCAFFOLD HAP1
//
SCAFFOLDING_HAP1( HIC_MAPPING_HAP1.out.bed, RAW_ASSEMBLY.out.hap1_hic_contigs, cool_bin, 'hap1' )
ch_versions = ch_versions.mix(SCAFFOLDING_HAP1.out.versions)
//
// SUBWORKFLOW: MAP HIC DATA TO THE HAP2 CONTIGS
//
HIC_MAPPING_HAP2 ( RAW_ASSEMBLY.out.hap2_hic_contigs, crams_ch, hic_aligner_ch, 'hap2' )
ch_versions = ch_versions.mix(HIC_MAPPING_HAP2.out.versions)
//
// SUBWORKFLOW: SCAFFOLD HAP2
//
SCAFFOLDING_HAP2( HIC_MAPPING_HAP2.out.bed, RAW_ASSEMBLY.out.hap2_hic_contigs, cool_bin, 'hap2' )
ch_versions = ch_versions.mix(SCAFFOLDING_HAP2.out.versions)
//
// LOGIC: CREATE A CHANNEL FOR THE FULL HAP1/HAP2 ASSEMBLY
//
SCAFFOLDING_HAP1.out.fasta.combine(SCAFFOLDING_HAP2.out.fasta)
.map{meta_s, fasta_s, meta_h, fasta_h -> [ [id:meta_h.id], fasta_s, fasta_h ]}
.set{ stats_haps_input_ch }
//
// SUBWORKFLOW: CALCULATE ASSEMBLY STATISTICS FOR HAP1/HAP2 ASSEMBLY
//
GENOME_STATISTICS_SCAFFOLDS_HAPS( stats_haps_input_ch,
PREPARE_INPUT.out.busco,
GENOMESCOPE_MODEL.out.hist,
GENOMESCOPE_MODEL.out.ktab,
[],
[],
set_busco_alts
)

@yumisims
Copy link
Contributor Author

yumisims commented Nov 5, 2024

Hi @yumisims Looks good to me in general, thanks for your work! What has to be changed - we don't have to run purge_dups when hifiasm is run in the trio mode. The logic is similar to the the hap1/hap2 assembly - each of the two hifiasm trio-phased files is to be scaffolded up directly:

if ( hifiasm_hic_on ) {
//
// SUBWORKFLOW: MAP HIC DATA TO THE HAP1 CONTIGS
//
HIC_MAPPING_HAP1 ( RAW_ASSEMBLY.out.hap1_hic_contigs, crams_ch, hic_aligner_ch, 'hap1' )
ch_versions = ch_versions.mix(HIC_MAPPING_HAP1.out.versions)
//
// SUBWORKFLOW: SCAFFOLD HAP1
//
SCAFFOLDING_HAP1( HIC_MAPPING_HAP1.out.bed, RAW_ASSEMBLY.out.hap1_hic_contigs, cool_bin, 'hap1' )
ch_versions = ch_versions.mix(SCAFFOLDING_HAP1.out.versions)
//
// SUBWORKFLOW: MAP HIC DATA TO THE HAP2 CONTIGS
//
HIC_MAPPING_HAP2 ( RAW_ASSEMBLY.out.hap2_hic_contigs, crams_ch, hic_aligner_ch, 'hap2' )
ch_versions = ch_versions.mix(HIC_MAPPING_HAP2.out.versions)
//
// SUBWORKFLOW: SCAFFOLD HAP2
//
SCAFFOLDING_HAP2( HIC_MAPPING_HAP2.out.bed, RAW_ASSEMBLY.out.hap2_hic_contigs, cool_bin, 'hap2' )
ch_versions = ch_versions.mix(SCAFFOLDING_HAP2.out.versions)
//
// LOGIC: CREATE A CHANNEL FOR THE FULL HAP1/HAP2 ASSEMBLY
//
SCAFFOLDING_HAP1.out.fasta.combine(SCAFFOLDING_HAP2.out.fasta)
.map{meta_s, fasta_s, meta_h, fasta_h -> [ [id:meta_h.id], fasta_s, fasta_h ]}
.set{ stats_haps_input_ch }
//
// SUBWORKFLOW: CALCULATE ASSEMBLY STATISTICS FOR HAP1/HAP2 ASSEMBLY
//
GENOME_STATISTICS_SCAFFOLDS_HAPS( stats_haps_input_ch,
PREPARE_INPUT.out.busco,
GENOMESCOPE_MODEL.out.hist,
GENOMESCOPE_MODEL.out.ktab,
[],
[],
set_busco_alts
)

Hi Ksenia, could you please take a look it again? I have add in scaffolding for trio case. Thank you

@yumisims
Copy link
Contributor Author

yumisims commented Nov 5, 2024

@gq1 the same error appear again, could you please take a look at the editorconfig? thanks

@gq1
Copy link
Member

gq1 commented Nov 5, 2024

@gq1 the same error appear again, could you please take a look at the editorconfig? thanks

https://github.com/sanger-tol/genomeassembly/actions/runs/11686036439/workflow?pr=55#L22
You need to use the old version as other pipelines you did? not the latest version.

@ksenia-krasheninnikova
Copy link
Contributor

Hi Yumi

Thank you for the updates! There are some considerations:

  • hifiasm_trio_on should be set to false in base.conf - similar how it's implemented for hifiasm_hic_on

  • the test based on test.yaml runs incomplete. The output files of the hifiasm in trio mode are not picked up because the hifiasm module refers to the wrong names of the gfa files: it has to be ".asm.dip.hap1.p_ctg.gfa" not ".asm.hic.hap1.p_ctg.gfa" etc (you can find them in the hifiasm run in the nextflow workdir). Hence the jobs gfa_to_fasta, assembly stats/busco/merqury and scaffolding are not picked up by nextflow.

  • purge_dups should not be run on an assembly with trio data but these jobs are being scheduled

  • If hifiasm was run in trio mode it has to be visible from it's name: baUndUnlc1.hifiasm-trio.20241106

  • I suggest to move the trio mode in a separate test rather than keeping it in test.conf

@yumisims
Copy link
Contributor Author

@ksenia-krasheninnikova Could you please have a look at the pr again? thanks.
I have added a separate config for trio mode, and make sure no trio mode in the test.config. Please let me know.
Thanks

@ksenia-krasheninnikova
Copy link
Contributor

Thanks @yumisims
I can see in the latest commit trio mode is switched off by default.

The primary assembly is still incomplete in the test mode.
I suggest to move the trio mode to a separate test case which means there has to be a specific test file created for the trio mode. There are paths to trio data in test.yaml which is not needed anymore if this functionality is not tested. I can see multiple tests configuration files were updated by adding trio files for sarscov but that wouldn't work for testing because the HiFi datasets in parents/offspring won't match.
Among the new changes - the naming of the output folder has to be changed as suggested in the previous post, because currently it refers to hifiasm-hic folder output. For this reason it's not possible to run the pipeline with --hifiasm_trio_on True as it fails with an error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants