-
Notifications
You must be signed in to change notification settings - Fork 4
Description
Hi!
I am trying to incorporate bcbio-vc docker image onto my pipeline manager and for that I am trying to run and test variant calling on bcbio-vc
docker image but I see that bcbio_nextgen.py
is unable to find the genome builds in the bcbio installation.
This is how I ran it:
- Created
~/bcbio/biodata/genomes
and~/bcbio/biodata/galaxy
directories on a local system which would be mounted to the docker container. And also created another directory~/bcbio-test
as the scratch for my test . - Started a docker container using
docker run -ti -v ~/bcbio/biodata:/mnt/biodata -v ~/bcbio-test:/data quay.io/bcbio/bcbio-vc
- In the container, I ran
bcbio_nextgen.py upgrade -u skip --genomes hg38 --genomes mm10 --aligners bwa
to download the reference genomes which were downloaded to/usr/local/share/bcbio-nextgen/genomes
and corresponding galaxy directory was updated at/usr/local/share/bcbio-nextgen/galaxy
. Attaching the tail of the stdout
List of genomes to get (from the config file at '{'genomes': [{'dbkey': 'hg38', 'name': 'Human (hg38) full', 'indexes': ['seq', 'twobit', 'bwa', 'hisat2'], 'annotations': ['ccds', 'capture_regions', 'coverage', 'prioritize', 'dbsnp', 'hapmap_snps', '1000g_omni_snps', 'ACMG56_genes', '1000g_snps', 'mills_indels', '1000g_indels', 'clinvar', 'qsignature', 'genesplicer', 'effects_transcripts', 'varpon', 'vcfanno', 'viral', 'purecn_mappability', 'simple_repeat', 'af_only_gnomad', 'transcripts', 'RADAR', 'rmsk', 'salmon-decoys', 'fusion-blacklist', 'mirbase'], 'validation': ['giab-NA12878', 'giab-NA24385', 'giab-NA24631', 'platinum-genome-NA12878', 'giab-NA12878-remap', 'giab-NA12878-crossmap', 'dream-syn4-crossmap', 'dream-syn3-crossmap', 'giab-NA12878-NA24385-somatic', 'giab-NA24143', 'giab-NA24149', 'giab-NA24694', 'giab-NA24695']}, {'dbkey': 'mm10', 'name': 'Mouse (mm10)', 'indexes': ['seq', 'twobit'], 'annotations': ['problem_regions', 'prioritize', 'dbsnp', 'vcfanno', 'transcripts', 'rmsk', 'mirbase']}], 'genome_indexes': ['bwa', 'rtg'], 'install_liftover': False, 'install_uniref': False}'): Human (hg38) full, Mouse (mm10)
bcbio-nextgen data upgrade complete.
Upgrade completed successfully.
- Then started to run this tutorial in the
/data
directory and it failed with the following error:
root@edc1034c416f:/data/cancer-dream-syn3/work# bcbio_nextgen.py ../config/cancer-dream-syn3.yaml -n 8
Running bcbio version: 1.2.4
global config: /data/cancer-dream-syn3/work/bcbio_system.yaml
run info config: /data/cancer-dream-syn3/config/cancer-dream-syn3.yaml
[2021-05-20T00:31Z] System YAML configuration: /data/cancer-dream-syn3/work/bcbio_system-merged.yaml.
[2021-05-20T00:31Z] Locale set to C.UTF-8.
[2021-05-20T00:31Z] Resource requests: bwa, sambamba, samtools; memory: 4.00, 4.00, 4.00; cores: 16, 16, 16
[2021-05-20T00:31Z] Configuring 1 jobs to run, using 8 cores each with 32.1g of memory reserved for each job
[2021-05-20T00:31Z] Timing: organize samples
[2021-05-20T00:31Z] multiprocessing: organize_samples
[2021-05-20T00:31Z] Using input YAML configuration: /data/cancer-dream-syn3/config/cancer-dream-syn3.yaml
[2021-05-20T00:31Z] Checking sample YAML configuration: /data/cancer-dream-syn3/config/cancer-dream-syn3.yaml
Traceback (most recent call last):
File "/usr/local/bin/bcbio_nextgen.py", line 245, in <module>
main(**kwargs)
File "/usr/local/bin/bcbio_nextgen.py", line 46, in main
run_main(**kwargs)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/pipeline/main.py", line 50, in run_main
fc_dir, run_info_yaml)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/pipeline/main.py", line 91, in _run_toplevel
for xs in pipeline(config, run_info_yaml, parallel, dirs, samples):
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/pipeline/main.py", line 128, in variant2pipeline
[x[0]["description"] for x in samples]]])
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/distributed/multi.py", line 28, in run_parallel
return run_multicore(fn, items, config, parallel=parallel)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/distributed/multi.py", line 86, in run_multicore
for data in joblib.Parallel(parallel["num_jobs"], batch_size=1, backend="multiprocessing")(joblib.delayed(fn)(*x) for x in items):
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.7/site-packages/joblib/parallel.py", line 1041, in __call__
if self.dispatch_one_batch(iterator):
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.7/site-packages/joblib/parallel.py", line 859, in dispatch_one_batch
self._dispatch(tasks)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.7/site-packages/joblib/parallel.py", line 777, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
result = ImmediateResult(func)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 572, in __init__
self.results = batch()
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.7/site-packages/joblib/parallel.py", line 263, in __call__
for func, args, kwargs in self.items]
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.7/site-packages/joblib/parallel.py", line 263, in <listcomp>
for func, args, kwargs in self.items]
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/utils.py", line 59, in wrapper
return f(*args, **kwargs)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/distributed/multitasks.py", line 459, in organize_samples
return run_info.organize(*args)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/pipeline/run_info.py", line 81, in organize
item = add_reference_resources(item, remote_retriever)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/pipeline/run_info.py", line 177, in add_reference_resources
data["dirs"]["galaxy"], data)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/pipeline/genome.py", line 233, in get_refs
galaxy_config, data)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/pipeline/genome.py", line 180, in _get_ref_from_galaxy_loc
(genome_build, os.path.normpath(loc_file)))
ValueError: Did not find genome build hg38 in bcbio installation: /data/cancer-dream-syn3/work/tool-data/sam_fa_indices.loc
I am not sure this is how this docker image is intended to be used but I see that the bcbio installation is all-encompassing based on the Dockerfile. After some sleuthing, I think the issue might be that the way bcbio_nextgen.py is getting the base intallation directory from this function which is causing it to look for .loc
file at /data/cancer-dream-syn3/work/tool-data/sam_fa_indices.loc
instead of /usr/local/share/bcbio-nextgen/galaxy/tool-data/sam_fa_indices.loc
Let me know if you have questions re: the same and Thanks in Advance!