Skip to content

Commit d8ef5ef

Browse files
authored
[MRG] fix genome download rule (#233)
* fix parsing of genome csv * update zip rule * update zip info * add genbank_cache/ to something that's tested by make test
1 parent 1ade842 commit d8ef5ef

File tree

3 files changed

+24
-22
lines changed

3 files changed

+24
-22
lines changed

doc/quickstart.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,6 @@ Some key output files under the outputs directory are:
9393
* `trim/{sample}.trim.fq.gz` - trimmed and preprocessed reads.
9494
* `sigs/{sample}.trim.sig.zip` - sourmash signature for the preprocessed reads.
9595

96-
Note that `genome-grist run <config.yml> zip` will create a file named `transfer.zip` with the above files in it.
96+
Note that `genome-grist run <config.yml> zip` will create a file named `<output_dir>.zip` with the above files in it.
9797

9898
Please see [the guide to genome-grist output files](output-guide.md) for more information!

genome_grist/conf/Snakefile

+22-21
Original file line numberDiff line numberDiff line change
@@ -468,10 +468,13 @@ rule check:
468468
@toplevel
469469
rule zip:
470470
shell: """
471-
rm -f transfer.zip
472-
zip -r transfer.zip {outdir}/leftover/*.summary.csv \
471+
ZIPFILE=$(basename "{outdir}").zip
472+
rm -f $ZIPFILE
473+
zip -r $ZIPFILE {outdir}/leftover/*.summary.csv \
473474
{outdir}/mapping/*.summary.csv {outdir}/*.yaml \
474-
{outdir}/gather/*.csv.gz {outdir}/reports/
475+
{outdir}/gather/*.csv.gz {outdir}/gather/*.out \
476+
{outdir}/reports/
477+
echo "Created $ZIPFILE"
475478
"""
476479

477480

@@ -1046,24 +1049,22 @@ rule download_matching_genome_wc:
10461049
output:
10471050
genome = f"{GENBANK_CACHE}/{{ident}}_genomic.fna.gz"
10481051
run:
1049-
with gzip.open(input.csvfile, 'rt') as infp:
1050-
r = csv.DictReader(infp)
1051-
rows = list(r)
1052-
assert len(rows) == 1
1053-
row = rows[0]
1054-
ident = row['ident']
1055-
assert wildcards.ident.startswith(ident)
1056-
url = row['genome_url']
1057-
name = row['display_name']
1058-
1059-
print(f"downloading genome for ident {ident}/{name} from NCBI...",
1060-
file=sys.stderr)
1061-
with open(output.genome, 'wb') as outfp:
1062-
with urllib.request.urlopen(url) as response:
1063-
content = response.read()
1064-
outfp.write(content)
1065-
print(f"...wrote {len(content)} bytes to {output.genome}",
1066-
file=sys.stderr)
1052+
rows = list(load_csv(input.csvfile))
1053+
assert len(rows) == 1
1054+
row = rows[0]
1055+
ident = row['ident']
1056+
assert wildcards.ident.startswith(ident)
1057+
url = row['genome_url']
1058+
name = row['display_name']
1059+
1060+
print(f"downloading genome for ident {ident}/{name} from NCBI...",
1061+
file=sys.stderr)
1062+
with open(output.genome, 'wb') as outfp:
1063+
with urllib.request.urlopen(url) as response:
1064+
content = response.read()
1065+
outfp.write(content)
1066+
print(f"...wrote {len(content)} bytes to {output.genome}",
1067+
file=sys.stderr)
10671068

10681069
# summarize_reads_info
10691070
rule summarize_reads_info_wc:

tests/test-data/SRR5950647.conf

+1
Original file line numberDiff line numberDiff line change
@@ -6,3 +6,4 @@ sourmash_databases:
66
taxonomies:
77
- ../sourmash/gtdb-rs202.taxonomy.v2.csv
88
metagenome_trim_memory: 1e9
9+
genbank_cache: outputs.test/genbank_cache

0 commit comments

Comments
 (0)